403 B
403 B
Crawl Features
These are bits of search-engine related code that are relatively isolated pieces of business logic, that benefit from the clarity of being kept separate from the rest of the crawling code.
- content-type - Content Type identification
- crawl-blocklist - IP and URL blocklists
- link-parser - Code for parsing and normalizing links