064265b0b9
This functionality needs to be accessed by the WarcSideloader, which is in the converter. The resultant microlibrary is tiny, but I think in this case it's justifiable. |
||
---|---|---|
.. | ||
content-type | ||
crawl-blocklist | ||
link-parser | ||
readme.md |
Crawl Features
These are bits of search-engine related code that are relatively isolated pieces of business logic, that benefit from the clarity of being kept separate from the rest of the crawling code.
- crawl-blocklist - IP and URL blocklists
- link-parser - Code for parsing and normalizing links