Linus’s Monocle project has a simple model for a personal search engine.
The components are:
- tokenizer
- storage model
- search engine
- user interface
the storage data structure is this:
type Doc = {
// A globally unique identifier for this document across all Monocle
// documents. It's usually a 2-3 letter prefix for the module (like "tw"
// for Tweets) followed by a number.
: string
id// A map of each token in the document to the number of times it appears
// in the document.
: Map<string, number>
tokens// The document's text content that will be displayed in the results page
: string
content// Optionally, the doc's title
?: string
title// Optionally a link to this document on the web that Monocle can use to
// "link out" to the original document from the search result.
?: string
href }