CatgirlIntelligenceAgency

Author	SHA1	Message	Date
Viktor Lofgren	467ba5be20	(index-construction) Split repartition into two actions This change splits the previous 'repartition' action into two steps, one for recalculating the domain rankings, and one for recalculating the other ranking sets. Since only the first is necessary before the index construction, the rest can be delayed until after... To avoid issues in handling the shotgun blast of MqNotifications, Service was switched over to use a synchronous message queue instead of an asynchronous one. The change also modifies the behavior so that only node 1 will push the changes to the EC_DOMAIN database table, to avoid unnecessary db locks and contention with the loader. Additionally, the change fixes a bug where the index construction code wasn't actually picking up the rankings data. Since the index construction used to be performed by the index-service, merely saving the data to memory was enough for it to be accessible within the index-construction logic, but since it's been broken out into a separate process, the new process just injected an empty DomainRankings object instead. To fix this, DomainRankings can now be persisted to disk, and a pre-loaded version of the object is injected into the index-construction process.	2024-02-06 17:20:07 +01:00
Viktor Lofgren	6271d5d544	(mq) Add relation tracking between MQ messages for easier tracking and debugging. The change adds a new column to the MESSAGE_QUEUE table called AUDIT_RELATED_ID. This field is populated transparently, using a dictionary mapping Thread IDs to Message IDs, populated by the inbox handlers. The existing RELATED_ID field has too many semantics associated with them, among other things the FSM code uses them this field in tracking state changes. The change set also improves the consistency of inbox names. The IndexClient was buggy and populated its outbox with a UUID. This is fixed. All Service2Service outboxes are now prefixed with 'pp:' to make them even easier to differentiate.	2024-01-18 15:08:27 +01:00
Viktor Lofgren	5a62b3058f	(query-api) Make the search set identifier a string value in the API This will free the core marginalia search engine to use arbitrary search set definitions, while the app can use its hardcoded defaults.	2024-01-16 10:55:24 +01:00
Viktor Lofgren	7c6e18f7a7	(*) Overhaul settings and properties Use a system.properties file to configure the system. This is loaded statically by MainClass or ProcessMainClass. Update the property names to be more consistent, and update the documentations to reflect the changes.	2024-01-13 17:12:18 +01:00
Viktor Lofgren	edc1acbb7e	(*) Replace EC_DOMAIN_LINK table with files and in-memory caching The EC_DOMAIN_LINK MariaDB table stores links between domains. This is problematic, as both updating and querying this table is very slow in relation to how small the data is (~10 GB). This slowness is largely caused by the database enforcing ACID guarantees we don't particularly need. This changeset replaces the EC_DOMAIN_LINK table with a file in each index node containing 32 bit integer pairs corresponding to links between two domains. This file is loaded in memory in each node, and can be queried via the Query Service. A migration step is needed before this file is created in each node. Until that happens, the actual data is loaded from the EC_DOMAIN_LINK table, but accessed as though it was a file. The changeset also migrates/renames the links.db file to documents.db to avoid naming confusion between the two.	2024-01-08 15:53:13 +01:00
Viktor Lofgren	4763077b76	(search/index) Add a new keyword "count" This is for filtering results on how many times the term appears on the domain. The intent is to be beneficial in creating e.g. a domain search feature. It's also very helpful when tracking down spammy domains.	2023-12-25 20:38:29 +01:00
dreimolo	c0cc05177f	corrects protobuf.plugins.grpc	2023-12-16 14:24:41 +01:00
dreimolo	0b34d43804	workaround for failing mac on apple silicon deps	2023-12-16 14:22:11 +01:00
Viktor Lofgren	e3ebb0c5bb	(*) Rename the search filter 'RETRO' into 'POPULAR' This will make the terminology more consistent between the GUI and the code. The rankings yaml still uses 'retro' though, for to retain compatibility.	2023-12-09 20:06:54 +01:00
Viktor Lofgren	347fe6b7be	(control) Reindex-all actor	2023-11-28 16:41:09 +01:00
Viktor Lofgren	1dafa0c74d	(mqapi/control) Repair repartition endpoint, deprecate notify endpoints. The repartition endpoint was mis-addressing its mqapi notifications, omitting the proper nodeId. In fixing this, it became apparent that having both @MqRequest and @MqNotification is a serious footgun, and the two should be unified into a single API where the caller isn't burdened with knowledge of the remote end's implementation specifics.	2023-11-27 16:01:12 +01:00
Viktor Lofgren	6bac3c75cb	(api) API documentation	2023-10-29 16:13:21 +01:00
Viktor Lofgren	ebd365a128	Fix exception	2023-10-24 15:04:12 +02:00
Viktor Lofgren	c2b28c0f8d	(api) Trial streaming API	2023-10-24 13:26:46 +02:00
Viktor Lofgren	a860f8f1a8	(index/qs) GRPC API for better query peformance	2023-10-24 11:38:07 +02:00
Viktor Lofgren	731afcb864	(qs) Parallel execution	2023-10-23 12:06:03 +02:00
Viktor Lofgren	efb73ff4e7	(qs) Don't blow up if an index node isn't responsive	2023-10-23 11:53:18 +02:00
Viktor Lofgren	16e0738731	(*) Get multi-node routing working.	2023-10-15 18:38:30 +02:00
Viktor Lofgren	4baf9527d7	() WIP Control GUI redesign, executor-service, multi-node mq This turned out to be very difficult to do in small isolated steps. Design overhaul of the control gui using bootstrap * Move the actors out of control-service into to a new executor-service, that can be run on multiple nodes * Add node-affinity to message queue	2023-10-14 12:08:43 +02:00
Viktor Lofgren	61288c5e68	(service, client) First steps towards multiple nodedness	2023-10-09 22:13:27 +02:00
Viktor Lofgren	77ccab7d80	(index) Move linkdb to index from search. This makes index complete in the sense that you can deploy an index instance and build a complete separate application on top of it, without having to go through the Marginalia-laden search service.	2023-10-08 16:48:35 +02:00
Viktor Lofgren	c51159672e	(build) Move unit test configuration to root build.gradle	2023-10-04 12:46:22 +02:00
Viktor Lofgren	14372e0ef0	(index) Slightly reduce alloc churn	2023-09-24 19:36:14 +02:00
Viktor Lofgren	dbe9235f3a	(*) Upgrade to JDK21 with preview enabled. ... also move some common configuration into the root build.gradle-file. Support for JDK21 in lombok is a bit sketchy at the moment, but it seems to work. This upgrade is kind of important as the new index construction really benefits from Arena based lifecycle control over off-heap memory.	2023-09-24 10:38:59 +02:00
Viktor Lofgren	764e7d1315	(index) Add more comprehensive integration tests for the index service.	2023-08-30 10:37:24 +02:00
Viktor Lofgren	3f288e264b	(minor) Clean up dead endpoints	2023-08-29 17:04:54 +02:00
Viktor Lofgren	b911665691	(index) Clean up and optimize valuator	2023-08-24 18:34:06 +02:00
Viktor Lofgren	56eb83319d	(index) Clean up result domain deduplicator	2023-08-24 18:24:55 +02:00
Viktor Lofgren	1e6800565a	(system) Remove EdgeId<T> and similar objects They seemed like a good idea at the time, but in practice they're wasting resources and not really providing the clarity I had hoped.	2023-08-24 17:46:02 +02:00
Viktor Lofgren	c909120ae1	(search) Basic working integration of linkdb in search service	2023-08-24 17:24:56 +02:00
Viktor Lofgren	9894f37412	(index) Implement new URL ID coding scheme. Also refactor along the way. Really needs an additional pass, these tests are very hairy.	2023-08-24 16:44:27 +02:00
Viktor Lofgren	ebc84c22fb	Upgrade antique lombok plugin This permits tests to run on JDK20 environments.	2023-08-23 14:34:32 +00:00
Viktor Lofgren	aa0d256d6a	Upgrade code to Java 20. * Change language version * Upgrade Lombok to a JDK20 compatible version	2023-08-23 13:37:49 +00:00
Viktor Lofgren	704de50a9b	(forward-index, valuator) HTML features in valuator Put it in the forward index for easy access during index-side valuation.	2023-08-18 11:54:56 +02:00
Viktor Lofgren	e7192a9cad	(mq) Refactor mq and actor library and move it to libraries out of common	2023-08-15 10:53:23 +02:00
Viktor Lofgren	9979c9defe	(search/index) Add blogosphere filter	2023-08-02 20:13:30 +02:00
Viktor Lofgren	1ec6f9cde2	(mq) More robust resume and recovery logic, protection against spurious state changes, minor bugfixes	2023-07-13 14:55:45 +02:00
Viktor Lofgren	8a53e107fa	(mq) Synchronous and Asynchronous inboxes.	2023-07-12 20:12:52 +02:00
Viktor Lofgren	88b9ec70c6	(control, WIP) Run reconvert-load from converter :D	2023-07-11 18:05:37 +02:00
Viktor Lofgren	d9e6c4f266	Trial integration of MQ-FSM into index service.	2023-07-06 18:04:16 +02:00
Viktor Lofgren	f12c6fd57e	Add a ranking parameter for biasing toward recent or old content.	2023-04-20 16:00:59 +02:00
Viktor Lofgren	fe419b12b4	Better handling of quote terms, fix bug in handling of longer queries. ... where some terms may previously have been ignored. The latter bug was due to the handling of QueryHeads with AnyOf-style predicates interacting poorly with alreadyConsideredTerms in SearchIndex.java	2023-04-10 13:11:40 +02:00
Viktor	a278fc6296	Increase search result relevance (#8 ) * Increase accuracy of the position bits. * Increase their width to 56. * Use a rolling position scheme for bits 16-56 to increase the average accuracy. * Result ranking overhaul * Optimized queries * BM25 in the index service's ranking * Make gui less jank * Javadocs for ranking parameters.	2023-04-07 20:18:08 +02:00
Viktor Lofgren	3fb249758e	Adjust result ordering.	2023-04-02 12:05:22 +02:00
Viktor Lofgren	cc4e089a5d	Consider average sentence length when selecting search results. This promotes proses over code listings, tabular data, etc.	2023-03-30 15:46:15 +02:00
Viktor Lofgren	17ca4f9eea	Permit search results that are all synthetic to pass relevancy check.	2023-03-27 17:27:35 +02:00
Viktor Lofgren	449471a076	Yet more restructuring. Improved search result ranking.	2023-03-16 21:35:54 +01:00
Viktor Lofgren	73eaa0865d	The refactoring will continue until morale improves.	2023-03-12 10:50:31 +01:00
Viktor Lofgren	616effdb3c	The refactoring will continue until morale improves.	2023-03-12 10:04:48 +01:00
Viktor Lofgren	4cec89da91	Fix bug where results would sometimes be presented solely based on the fact that the document is important on the site in general, regardless of whether it's important to the document.	2023-03-11 14:20:32 +01:00

1 2

57 Commits