0307c55f9f
To avoid having to either hard-code or manually configure service addresses (possibly several dozen), and to reduce the project's dependency on docker to deal with routing and discovery, the option to use [Zookeeper](https://zookeeper.apache.org/) to manage services and discovery has been added. A service registry interface was added, with a Zookeeper implementation and a basic implementation that only works on docker and hard-codes everything. The last remaining REST service, the assistant-service, has been migrated to gRPC. This also proved a good time to clear out primordial technical debt from the root of the codebase. The 'service-client' library has been taken behind the barn and given a last farewell. It's replaced by a small library for managing gRPC channels. Since it's no longer used by anything, RxJava has been removed as a dependency from the project. Although the current state seems reasonably stable, this is a work-in-progress commit. |
||
---|---|---|
.. | ||
src | ||
build.gradle | ||
index.svg | ||
merging.svg | ||
preindex.svg | ||
readme.md |
Reverse Index
The reverse index contains a mapping from word to document id.
There are two tiers of this index.
- A priority index which only indexes terms that are flagged with priority flags1.
- A full index that indexes all terms.
The full index also provides access to term-level metadata, while the priority index is a binary index that only offers information about which documents has a specific word.
[1] See WordFlags in common/model and KeywordMetadata in features-convert/keyword-extraction.
Construction
The reverse index is constructed by first building a series of preindexes. Preindexes consist of a Segment and a Documents object. The segment contains information about which word identifiers are present and how many, and the documents contain information about in which documents the words can be found.
These would typically not fit in RAM, so the index journal is paged and the preindexes are constructed small enough to fit in memory, and then merged. Merging sorted arrays is a very fast operation that does not require additional RAM.
Once merged into one large preindex, indexes are added to the preindex data to form a finalized reverse index.
Central Classes
- ReversePreindex intermediate reverse index state.
- ReverseIndexConstructor constructs the index.
- ReverseIndexReader interrogates the index.