CatgirlIntelligenceAgency/code/process-models
Viktor Lofgren dec3b1092d (converter) Fix bugs in conversion
This commit adds a safety check that the URL of the document is from the correct domain.

It also adds a sizeHint() method to SerializableCrawlDataStream which *may* provide an indication if the stream is very large and benefits from sideload-style processing (which is slow).

It furthermore addresses a bug where the ProcessedDomain.write() invoked the wrong method on ConverterBatchWriter and only wrote the domain metadata, not the rest...
2023-12-29 13:58:08 +01:00
..
crawl-spec (*) WIP Add node affinity to EC_DOMAIN 2023-10-19 17:48:34 +02:00
crawling-model (converter) Fix bugs in conversion 2023-12-29 13:58:08 +01:00
processed-data (*) Refactor GeoIP-related code 2023-12-10 17:30:43 +01:00
work-log (build) Move unit test configuration to root build.gradle 2023-10-04 12:46:22 +02:00