2023-03-05 19:31:43 +01:00
|
|
|
# Forward Index
|
|
|
|
|
2024-02-10 14:16:01 +01:00
|
|
|
The forward index contains a mapping from document id to various forms of document metadata.
|
|
|
|
|
|
|
|
In practice, the forward index consists of two files, an `id` file and a `data` file.
|
|
|
|
|
|
|
|
The `id` file contains a list of sorted document ids, and the `data` file contains
|
|
|
|
metadata for each document id, in the same order as the `id` file, with a fixed
|
|
|
|
size record containing data associated with each document id.
|
|
|
|
|
|
|
|
Each record contains a binary encoded [DocumentMetadata](../../common/model/src/main/java/nu/marginalia/model/idx/DocumentMetadata.java) object,
|
|
|
|
as well as a [HtmlFeatures](../../common/model/src/main/java/nu/marginalia/model/crawl/HtmlFeature.java) bitmask.
|
|
|
|
|
|
|
|
Unlike the reverse index, the forward index is not split into two tiers, and the data is in the same
|
|
|
|
order as it is in the source data, and the cardinality of the document IDs is assumed to fit in memory,
|
|
|
|
so it's relatively easy to construct.
|
2023-03-05 19:31:43 +01:00
|
|
|
|
|
|
|
## Central Classes
|
|
|
|
|
|
|
|
* [ForwardIndexConverter](src/main/java/nu/marginalia/index/forward/ForwardIndexConverter.java) constructs the index.
|
|
|
|
* [ForwardIndexReader](src/main/java/nu/marginalia/index/forward/ForwardIndexReader.java) interrogates the index.
|