History

Viktor Lofgren aa0d256d6a Upgrade code to Java 20. * Change language version * Upgrade Lombok to a JDK20 compatible version		2023-08-23 13:37:49 +00:00
..
src	(crawler) Reduce log spam	2023-08-16 11:12:09 +02:00
build.gradle	Upgrade code to Java 20.	2023-08-23 13:37:49 +00:00
readme.md	More restructuring, big bug fixes in keyword extraction.	2023-03-13 17:39:53 +01:00

Crawling Process

The crawling process downloads HTML and saves them into per-domain snapshots.

Central Classes

CrawlerMain orchestrates the crawling.
CrawlerRetreiver visits known addresses from a domain and downloads each document.
HttpFetcher fetches a URL.