CatgirlIntelligenceAgency/code/processes/crawling-process
Viktor Lofgren 5c040f7a46 (crawl-spec) Parquetify crawl spec
* Crawl-specs are now parquet files
* Deprecate the crawl-job-extractor tool
2023-09-17 09:41:34 +02:00
..
src (crawl-spec) Parquetify crawl spec 2023-09-17 09:41:34 +02:00
build.gradle (crawl-spec) Parquetify crawl spec 2023-09-17 09:41:34 +02:00
readme.md More restructuring, big bug fixes in keyword extraction. 2023-03-13 17:39:53 +01:00

Crawling Process

The crawling process downloads HTML and saves them into per-domain snapshots.

Central Classes

See Also