52a0255814
The production configuration assumes all content of interest is 7 bit ASCII, and makes a series of optimizations based on this. This assumption holds poorly in the wild. Adding an **experimental** system property 'system.noFlattenUnicode', that when set to TRUE, will disable this behavior. IMPORTANT!! The index needs to be re-constructed when this flag is changed, as different hash functions are selected for the keyword->identifier mappings. |
||
---|---|---|
.. | ||
commons-codec | ||
count-min-sketch | ||
encyclopedia-marginalia-nu | ||
monkey-patch-gson | ||
monkey-patch-opennlp | ||
openzim | ||
parquet-floor | ||
porterstemmer | ||
rdrpostagger | ||
symspell | ||
README.md |
Third Party Code
This is a mix of code from other projects, that has either been aggressively modified to suite the needs of the project, or lack an artifact, or to override some default that is inappropriate for the type of data Marginalia throws at the library.
Sources and Licenses
Modified
- RDRPosTagger - GPL3
- PorterStemmer - LGPL3
- OpenZIM - GPL-2.0+
- Commons Codec - Apache 2.0
- encylopedia.marginalia.nu - GPL 2.0+
Repackaged
- SymSpell - LGPL-3.0
- Count-Min-Sketch - Apache 2.0
Monkey Patched
- Stanford OpenNLP - Apache-2.0
- GSON - Apache-2.0