24051fec03
The processor normally retains the domain data in memory after processing to be able to do additional site-wide analysis. This works well, except there are a number of outlier websites that have an absurd number of documents that can rapidly fill up the heap of the process. These websites now receive a simplified treatment. This is executed in the converter batch writer thread. This is slower, but the documents will not be persisted in memory. |
||
---|---|---|
.. | ||
crawl-data-unfcker | ||
experiment-runner | ||
load-test | ||
screenshot-capture-tool | ||
stackexchange-converter | ||
term-frequency-extractor |