c41e68aaab
This commit also refactors the executor a bit, and introduces a new converter-feature called data-extractors for this class of jobs. |
||
---|---|---|
.. | ||
src/main/java/nu/marginalia/tools | ||
build.gradle | ||
readme.md |
Term Frequency Extractor
Generates a term frequency dictionary file from a batch of crawl data.
Usage:
PATH_TO_SAMPLES=run/samples/crawl-s
export JAVA_OPTS=-Dcrawl.rootDirRewrite=/crawl:${PATH_TO_SAMPLES}
term-frequency-extractor ${PATH_TO_SAMPLES}/plan.yaml out.dat