CatgirlIntelligenceAgency/code/process-models/crawling-model/src
Viktor Lofgren 5329968155 (crawler) Update CrawlingThenConvertingIntegrationTest
This commit updates CrawlingThenConvertingIntegrationTest with additional tests for invalid, redirecting, and blocked domains. Improvements have also been made to filter out irrelevant entries in ParquetSerializableCrawlDataStream.
2023-12-15 21:04:06 +01:00
..
main/java (crawler) Update CrawlingThenConvertingIntegrationTest 2023-12-15 21:04:06 +01:00
test/java/nu/marginalia/crawling/parquet (crawler) Add timestamp to CrawledDocument records 2023-12-15 20:23:27 +01:00