CatgirlIntelligenceAgency/code/crawl-processes/crawling-process/readme.md
2023-03-12 11:42:07 +01:00

13 lines
465 B
Markdown

# Crawling Process
The crawling process downloads HTML and saves them
into per-domain snapshots.
## Central Classes
* [CrawlerMain](src/main/java/nu/marginalia/crawl/CrawlerMain.java) orchestrates the crawling.
* [CrawlerRetreiver](src/main/java/nu/marginalia/crawl/retreival/CrawlerRetreiver.java)
visits known addresses from a domain and downloads each document.
* [HttpFetcher](src/main/java/nu/marginalia/crawl/retreival/HttpFetcher.java)
fetches a URL.