CatgirlIntelligenceAgency/code/process-models
Viktor Lofgren 3a56a06c4f (warc) Add a fields for etags and last-modified headers to the new crawl data formats
Make some temporary modifications to the CrawledDocument model to support both a "big string" style headers field like in the old formats, and explicit fields as in the new formats.  This is a bit awkward to deal with, but it's a necessity until we migrate off the old formats entirely.

The commit also adds a few tests to this logic.
2023-12-18 17:45:54 +01:00
..
crawl-spec (*) WIP Add node affinity to EC_DOMAIN 2023-10-19 17:48:34 +02:00
crawling-model (warc) Add a fields for etags and last-modified headers to the new crawl data formats 2023-12-18 17:45:54 +01:00
processed-data (*) Refactor GeoIP-related code 2023-12-10 17:30:43 +01:00
work-log (build) Move unit test configuration to root build.gradle 2023-10-04 12:46:22 +02:00