1d34224416
Look, this will make the git history look funny, but trimming unnecessary depth from the source tree is a very necessary sanity-preserving measure when dealing with a super-modularized codebase like this one. While it makes the project configuration a bit less conventional, it will save you several clicks every time you jump between modules. Which you'll do a lot, because it's *modul*ar. The src/main/java convention makes a lot of sense for a non-modular project though. This ain't that.
1.2 KiB
1.2 KiB
Forward Index
The forward index contains a mapping from document id to various forms of document metadata.
In practice, the forward index consists of two files, an id
file and a data
file.
The id
file contains a list of sorted document ids, and the data
file contains
metadata for each document id, in the same order as the id
file, with a fixed
size record containing data associated with each document id.
Each record contains a binary encoded DocumentMetadata object, as well as a HtmlFeatures bitmask.
Unlike the reverse index, the forward index is not split into two tiers, and the data is in the same order as it is in the source data, and the cardinality of the document IDs is assumed to fit in memory, so it's relatively easy to construct.
Central Classes
- ForwardIndexConverter constructs the index.
- ForwardIndexReader interrogates the index.