CatgirlIntelligenceAgency/code/tools
Viktor Lofgren 24051fec03 (converter) WIP Run sideload-style processing for large domains
The processor normally retains the domain data in memory after processing to be able to do additional site-wide analysis.   This works well, except there are a number of outlier websites that have an absurd number of documents that can rapidly fill up the heap of the process.

These websites now receive a simplified treatment.  This is executed in the converter batch writer thread.  This is slower, but the documents will not be persisted in memory.
2023-12-27 18:20:03 +01:00
..
crawl-data-unfcker (crawler) WIP integration of WARC files into the crawler and converter process. 2023-12-13 15:33:42 +01:00
experiment-runner (converter) WIP Run sideload-style processing for large domains 2023-12-27 18:20:03 +01:00
load-test (build) Move unit test configuration to root build.gradle 2023-10-04 12:46:22 +02:00
screenshot-capture-tool (screenshot-capture-tool) Make screenshot-capture-tool cooperate with docker 2023-11-01 16:38:55 +01:00
stackexchange-converter (build) Move unit test configuration to root build.gradle 2023-10-04 12:46:22 +02:00
term-frequency-extractor (build) Move unit test configuration to root build.gradle 2023-10-04 12:46:22 +02:00