507f26ad47
(converter) Refactor converter to not keep instructions list in RAM. (converter) Refactor converter to not keep instructions list in RAM. |
||
---|---|---|
.. | ||
src | ||
build.gradle | ||
readme.md |
Converting Process
The converting process reads crawl data and extracts information to be fed into the index, such as keywords, metadata, urls, descriptions...
Central Classes
- ConverterMain orchestrates the conversion process.
- DocumentProcessor converts a single document.
-
- HtmlDocumentProcessorPlugin has HTML-specific logic related to a document, keywords and identifies features such as whether it has javascript.
-
- PlainTextDocumentProcessorPlugin has plain text-specific logic related to a document...
- DomainProcessor converts each document and generates domain-wide metadata such as link graphs.