0caef1b307
Add a toggle for saving the WARC data generated by the search engine's crawler. Normally this is discarded, but for debugging or archival purposes, retaining it may be of interest. The warc files are concatenated into larger archives, up to about 1 GB each. An index is also created containing filenames, domain names, offsets and sizes to help navigate these larger archives. The warc data is saved in a directory warc/ under the crawl data storage. |
||
---|---|---|
.. | ||
config | ||
db | ||
linkdb | ||
model | ||
process | ||
renderer | ||
service | ||
service-client | ||
service-discovery | ||
readme.md |
Common
These are packages containing the basic building blocks for running a service as well as shared models.
- db contains SQL code and some database-related utilities.
- config contains some
@Inject
ables. - renderer contains utility code for rendering website templates.
- service is the shared base classes for main methods and web services.
- service-client is the shared base class for RPC.
- service-discovery contains tools that lets the services find each other.
- process contains boiler plate for batch processes.