787a20cbaa
This is not hooked into anything yet. The change also makes modifications to the parquet-floor library to support reading and writing of byte[] arrays. This is desirable since we may in the future want to support inputs that are not text-based, and codifying the assumption that each document is a string will definitely cause us grief down the line. |
||
---|---|---|
.. | ||
commons-codec | ||
count-min-sketch | ||
monkey-patch-gson | ||
monkey-patch-opennlp | ||
openzim | ||
parquet-floor | ||
porterstemmer | ||
rdrpostagger | ||
symspell | ||
uppend | ||
xz | ||
README.md |
Third Party Code
This is a mix of code from other projects, that has either been aggressively modified to suite the needs of the project, or lack an artifact, or to override some default that is inappropriate for the type of data Marginalia throws at the library.
Sources and Licenses
Modified
- RDRPosTagger - GPL3
- PorterStemmer - LGPL3
- Uppend - MIT
- OpenZIM - GPL-2.0
- Commons Codec - Apache 2.0
Repackaged
- SymSpell - LGPL-3.0
- Count-Min-Sketch - Apache 2.0
Monkey Patched
- Stanford OpenNLP - Apache-2.0
- GSON - Apache-2.0