dec3b1092d
This commit adds a safety check that the URL of the document is from the correct domain. It also adds a sizeHint() method to SerializableCrawlDataStream which *may* provide an indication if the stream is very large and benefits from sideload-style processing (which is slow). It furthermore addresses a bug where the ProcessedDomain.write() invoked the wrong method on ConverterBatchWriter and only wrote the domain metadata, not the rest... |
||
---|---|---|
.. | ||
src | ||
build.gradle | ||
readme.md |
Crawling Models
Contains models shared by the crawling-process and converting-process.