CatgirlIntelligenceAgency/code/services-core/executor-service
Viktor Lofgren e49ba887e9 (crawl data) Add compatibility layer for old crawl data format
The new converter logic assumes that the crawl data is ordered where the domain record comes first, and then a sequence of document records.  This is true for the new parquet format, but not for the old zstd/gson format.

To make the new converter compatible with the old format, a specialized reader is introduced that scans for the domain record before running through the sequence of document records; and presenting them in the new order.

This is slower than just reading the file beginning to end, so in order to retain performance when this ordering isn't necessary, a CompatibilityLevel flag is added to CrawledDomainReader, permitting the caller to decide how compatible the data needs to be.

Down the line when all the old data is purged, this should be removed, as it amounts to technical debt.
2024-01-08 19:16:49 +01:00
..
src (crawl data) Add compatibility layer for old crawl data format 2024-01-08 19:16:49 +01:00
build.gradle (*) Replace EC_DOMAIN_LINK table with files and in-memory caching 2024-01-08 15:53:13 +01:00
readme.md (docs) Update documentation 2023-10-27 12:45:39 +02:00

The executor service is a partitioned service responsible for executing and keeping track of long running maintenance and operational tasks, such as crawling or data processing.

It accomplishes this using the message queue and actor library, which permits program state to survive crashes and reboots. The executor service is closely linked to the control-service, which provides a user interface for much of the executor's functionality.

Central Classes