CatgirlIntelligenceAgency/code/tools/crawl-job-extractor
2023-06-07 22:01:35 +02:00
..
src Adjust the logic for the crawl job extractor to set a relatively low visit limit for websites that are new in the index or has not yielded many good documents previously. 2023-06-07 22:01:35 +02:00
build.gradle Move database to a separate module 2023-03-25 15:26:17 +01:00
readme.md Remove unrelated code, break tools into their own directory. 2023-03-17 16:03:11 +01:00

Crawl Job Extractor

The crawl job extractor creates a file containing a list of domains along with known URLs.

This is consumed by processes/crawling-process.