CatgirlIntelligenceAgency/code/processes/crawling-process
Viktor 0f9b90eb1c
Better fingerprinting (#35)
* Better fingerprinting for server tech
* Many more features in FeatureExtractor
* Blog specialization
* SiteType table
2023-07-10 17:36:12 +02:00
..
src Better fingerprinting (#35) 2023-07-10 17:36:12 +02:00
build.gradle Tests for crawler specialization + testdata 2023-06-27 10:57:54 +02:00
readme.md More restructuring, big bug fixes in keyword extraction. 2023-03-13 17:39:53 +01:00

Crawling Process

The crawling process downloads HTML and saves them into per-domain snapshots.

Central Classes

See Also