CatgirlIntelligenceAgency/code/process-models/crawling-model
Viktor Lofgren 0889b6d247 (warc) Clean up parquet conversion
This commit further cleans up the warc->parquet conversion. It fixes issues with redirect handling in WarcRecorder, adds support information about redirects and errors due to probe failure.

It also refactors the fetch result, body extraction and content type abstractions.
2023-12-14 20:39:40 +01:00
..
src (warc) Clean up parquet conversion 2023-12-14 20:39:40 +01:00
build.gradle (crawling-model) Implement a parquet format for crawl data 2023-12-13 16:22:19 +01:00
readme.md (refactor) Remove features-search and update documentation 2023-10-09 15:12:30 +02:00

Crawling Models

Contains models shared by the crawling-process and converting-process.

Central Classes

Serialization