Commit Graph

18 Commits

Author SHA1 Message Date
Viktor Lofgren
708a741960 (test) Clean up test usage of migrations
Several tests were manually running migrations in a large copy-paste blob of code.  This makes the test less useful as it's possible to break the code while keeping the tests green by introducing a new migration that never gets run in the tests, and it's also difficult to reason about what the tests are doing.

A new test helper library is introduced with a TestMigrationLoader that can both run Flyway migrations, or load specific migrations in the cases a specific set of migrations need to be loaded.   Existing tests are migrated to use the new code.
2024-01-12 15:55:50 +01:00
Viktor Lofgren
fbad625126 (linkdb) Add delegating implementation of DomainLinkDb
This facilitates switching between SQL and File-backed implementations on the fly while migrating from one to the other.
2024-01-08 19:56:33 +01:00
Viktor Lofgren
edc1acbb7e (*) Replace EC_DOMAIN_LINK table with files and in-memory caching
The EC_DOMAIN_LINK MariaDB table stores links between domains.  This is problematic, as both updating and querying this table is very slow in relation to how small the data is (~10 GB).  This slowness is largely caused by the database enforcing ACID guarantees we don't particularly need.

This changeset replaces the EC_DOMAIN_LINK table with a file in each index node containing 32 bit integer pairs corresponding to links between two domains.  This file is loaded in memory in each node, and can be queried via the Query Service.

A migration step is needed before this file is created in each node.   Until that happens, the actual data is loaded from the EC_DOMAIN_LINK table, but accessed as though it was a file.

The changeset also migrates/renames the links.db file to documents.db to avoid naming confusion between the two.
2024-01-08 15:53:13 +01:00
Viktor Lofgren
a7cd490593 (minor) Remove dead code. 2023-12-19 18:58:33 +01:00
Viktor Lofgren
4baf9527d7 (*) WIP Control GUI redesign, executor-service, multi-node mq
This turned out to be very difficult to do in small isolated steps.

* Design overhaul of the control gui using bootstrap
* Move the actors out of control-service into to a new executor-service, that can be run on multiple nodes
* Add node-affinity to message queue
2023-10-14 12:08:43 +02:00
Viktor Lofgren
3889c4bdd9 (refactor) Remove features-search and update documentation 2023-10-09 15:12:30 +02:00
Viktor Lofgren
77ccab7d80 (index) Move linkdb to index from search.
This makes index complete in the sense that you can deploy an index instance and build a complete separate application on top of it, without having to go through the Marginalia-laden search service.
2023-10-08 16:48:35 +02:00
Viktor Lofgren
c51159672e (build) Move unit test configuration to root build.gradle 2023-10-04 12:46:22 +02:00
Viktor Lofgren
dbe9235f3a (*) Upgrade to JDK21 with preview enabled.
... also move some common configuration into the root build.gradle-file.

Support for JDK21 in lombok is a bit sketchy at the moment, but it seems to work.  This upgrade is kind of important as the new index construction really benefits from Arena based lifecycle control over off-heap memory.
2023-09-24 10:38:59 +02:00
Viktor Lofgren
5c040f7a46 (crawl-spec) Parquetify crawl spec
* Crawl-specs are now parquet files
* Deprecate the crawl-job-extractor tool
2023-09-17 09:41:34 +02:00
Viktor Lofgren
35996d0adb (docs) Update the documentation up-to-date information 2023-09-14 11:33:36 +02:00
Viktor Lofgren
03d999444d (ldb) Re-add accidentally removed stmt.addBatch that breaks 2023-08-31 12:06:30 +02:00
Viktor Lofgren
763ed260c3 (ldb) Better handling of null pubYear 2023-08-30 23:08:27 +02:00
Viktor Lofgren
048f685073 (ldb) add OR IGNORE to insert status query
Otherwise it will sometimes fail because documents may appear more than once in error scenarios.
2023-08-30 10:34:01 +02:00
Viktor Lofgren
dd593c292c (loader) Minor optimizations and bugfixes.
* Reduce memory churn in LoaderIndexJournalWriter, fix bug with keyword mappings as well
* Remove remains of OldDomains
* Ensure LOADER_PROCESS_OPTS gets fed to the processes
* LinkdbStatusWriter won't execute batch after each added item post 100 items
2023-08-29 15:37:52 +02:00
Viktor Lofgren
c909120ae1 (search) Basic working integration of linkdb in search service 2023-08-24 17:24:56 +02:00
Viktor Lofgren
6a04cdfddf (loader) Implement new linkdb in loader
Deprecate the LoadUrl instruction entirely. We no longer need to be told upfront about which URLs to expect, as IDs are generated from the domain id and document ordinal.

For now, we no longer store new URLs in different domains.  We need to re-implement this somehow, probably in a different job or a as a different output.
2023-08-24 13:07:54 +02:00
Viktor Lofgren
b22f4fbb72 (linkdb) New Module for sqlite-backed document db 2023-08-24 09:06:13 +02:00