CatgirlIntelligenceAgency/code/index/index-forward/readme.md

21 lines
1.2 KiB
Markdown
Raw Normal View History

2023-03-05 19:31:43 +01:00
# Forward Index
2024-02-10 14:16:01 +01:00
The forward index contains a mapping from document id to various forms of document metadata.
In practice, the forward index consists of two files, an `id` file and a `data` file.
The `id` file contains a list of sorted document ids, and the `data` file contains
metadata for each document id, in the same order as the `id` file, with a fixed
size record containing data associated with each document id.
Each record contains a binary encoded [DocumentMetadata](../../common/model/src/main/java/nu/marginalia/model/idx/DocumentMetadata.java) object,
as well as a [HtmlFeatures](../../common/model/src/main/java/nu/marginalia/model/crawl/HtmlFeature.java) bitmask.
Unlike the reverse index, the forward index is not split into two tiers, and the data is in the same
order as it is in the source data, and the cardinality of the document IDs is assumed to fit in memory,
so it's relatively easy to construct.
2023-03-05 19:31:43 +01:00
## Central Classes
* [ForwardIndexConverter](src/main/java/nu/marginalia/index/forward/ForwardIndexConverter.java) constructs the index.
* [ForwardIndexReader](src/main/java/nu/marginalia/index/forward/ForwardIndexReader.java) interrogates the index.