cf935a5331
Add an optional new field to CrawledDocument containing information about whether the domain has cookies. This was previously on the CrawledDomain object, but since the WarcFormat requires us to write a WarcInfo object at the start of a crawl rather than at the end, this information is unobtainable when creating the CrawledDomain object. Also fix a bug in the deduplication logic in the DomainProcessor class that caused a test to break. |
||
---|---|---|
.. | ||
src | ||
build.gradle | ||
readme.md |
Crawling Models
Contains models shared by the crawling-process and converting-process.