aa0d256d6a
* Change language version * Upgrade Lombok to a JDK20 compatible version |
||
---|---|---|
.. | ||
src | ||
build.gradle | ||
readme.md |
Lexicon
The lexicon contains a mapping for words to identifiers.
To ease index construction, it makes calculations easier if the domain of word identifiers is dense, that is, there is no gaps between ids; if there are 100 words, they're indexed 0-99 and not 5, 23, 107, 9999, 819235 etc. The lexicon exists to create such a mapping.
This lexicon is populated from a journal. The actual word data isn't mapped, but rather a 64 bit hash. As a result of the birthday paradox, colissions will be rare up until about to 232 words.
The lexicon is constructed by processes/loading-process and read when services-core/index-service interprets queries.
Central Classes
- KeywordLexicon
- KeywordLexiconJournal
- DictionaryMap comes in two versions
-
- OnHeapDictionaryMap - basically just a fastutil Long2IntOpenHashMap
-
- OffHeapDictionaryHashMap - a heavily modified trove TLongIntHashMap that uses off heap memory