CatgirlIntelligenceAgency/code/tools/experiment-runner
Viktor Lofgren 24051fec03 (converter) WIP Run sideload-style processing for large domains
The processor normally retains the domain data in memory after processing to be able to do additional site-wide analysis.   This works well, except there are a number of outlier websites that have an absurd number of documents that can rapidly fill up the heap of the process.

These websites now receive a simplified treatment.  This is executed in the converter batch writer thread.  This is slower, but the documents will not be persisted in memory.
2023-12-27 18:20:03 +01:00
..
src/main/java/nu/marginalia/tools (converter) WIP Run sideload-style processing for large domains 2023-12-27 18:20:03 +01:00
build.gradle Initial Commit Anchor Tags 2023-11-04 14:24:17 +01:00
readme.md Add experiment runner tool and got rid of experiments module in processes. 2023-03-28 16:58:46 +02:00

Experiment Runner

This tool is a means of launching crawl data processing experiments, for interacting with crawl data.

It's launched with run/experiment.sh. New experiments need to be added to ExperimentRunnerMain in order for the script to be able to run them.