CatgirlIntelligenceAgency/code/processes/index-constructor-process
Viktor Lofgren 467ba5be20 (index-construction) Split repartition into two actions
This change splits the previous 'repartition' action into two steps, one for recalculating the domain rankings, and one for recalculating the other ranking sets.  Since only the first is necessary before the index construction, the rest can be delayed until after...

To avoid issues in handling the shotgun blast of MqNotifications, Service was switched over to use a synchronous message queue instead of an asynchronous one.

The change also modifies the behavior so that only node 1 will push the changes to the EC_DOMAIN database table, to avoid unnecessary db locks and contention with the loader.

Additionally, the change fixes a bug where the index construction code wasn't actually picking up the rankings data.

Since the index construction used to be performed by the index-service, merely saving the data to memory was enough for it to be accessible within the index-construction logic, but since it's been broken out into a separate process, the new process just injected an empty DomainRankings object instead.

To fix this, DomainRankings can now be persisted to disk, and a pre-loaded version of the object is injected into the index-construction process.
2024-02-06 17:20:07 +01:00
..
src/main/java/nu/marginalia/index (index-construction) Split repartition into two actions 2024-02-06 17:20:07 +01:00
build.gradle (build) Remove false depdencency between icp and index-service 2024-01-05 13:22:13 +01:00
readme.md (doc) Update docs 2024-02-06 16:29:55 +01:00

The index construction process is responsible for creating the indexes used by the search engine.

There are three types of indexes:

  • The forward index, which maps documents to words.
  • The full reverse index, which maps words to documents; and includes all words.
  • The priority reverse index, which maps words to documents; but includes only the most "important" words (such as those appearing in the title, or with especially high TF-IDF scores).

This is a very light-weight module that delegates the actual work to the modules:

Their respective readme files contain more information about the indexes themselves and how they are constructed.

The process is glued together within IndexConstructorMain, which is the only class of interest in this module.