CatgirlIntelligenceAgency/code/features-index/lexicon
Viktor Lofgren 30584887f9 DictionaryMap changes.
Add new flag to change the default size to make prod index boot faster. Remove option to select OffHeapDictionaryHashMap.
2023-03-27 17:28:39 +02:00
..
src DictionaryMap changes. 2023-03-27 17:28:39 +02:00
build.gradle The refactoring will continue until morale improves. 2023-03-12 10:50:31 +01:00
readme.md Update readme.md 2023-03-22 17:15:36 +01:00

Lexicon

The lexicon contains a mapping for words to identifiers.

To ease index construction, it makes calculations easier if the domain of word identifiers is dense, that is, there is no gaps between ids; if there are 100 words, they're indexed 0-99 and not 5, 23, 107, 9999, 819235 etc. The lexicon exists to create such a mapping.

This lexicon is populated from a journal. The actual word data isn't mapped, but rather a 64 bit hash. As a result of the birthday paradox, colissions will be rare up until about to 232 words.

The lexicon is constructed by processes/loading-process and read when services-core/index-service interprets queries.

Central Classes