b74a3ebd85
At this stage, the crawler will use the WARCs to resume a crawl if it terminates incorrectly. This is a WIP commit, since the warc files are not fully incorporated into the work flow, they are deleted after the domain is crawled. The commit also includes fairly invasive refactoring of the crawler classes, to accomplish better separation of concerns. |
||
---|---|---|
.. | ||
config | ||
db | ||
linkdb | ||
model | ||
process | ||
renderer | ||
service | ||
service-client | ||
service-discovery | ||
readme.md |
Common
These are packages containing the basic building blocks for running a service as well as shared models.
- db contains SQL code and some database-related utilities.
- config contains some
@Inject
ables. - renderer contains utility code for rendering website templates.
- service is the shared base classes for main methods and web services.
- service-client is the shared base class for RPC.
- service-discovery contains tools that lets the services find each other.
- process contains boiler plate for batch processes.