CatgirlIntelligenceAgency

Author	SHA1	Message	Date
Viktor Lofgren	4155fbe94c	(control) Reprocess-all actor	2023-11-28 17:58:48 +01:00
Viktor Lofgren	347fe6b7be	(control) Reindex-all actor	2023-11-28 16:41:09 +01:00
Viktor Lofgren	1dafa0c74d	(mqapi/control) Repair repartition endpoint, deprecate notify endpoints. The repartition endpoint was mis-addressing its mqapi notifications, omitting the proper nodeId. In fixing this, it became apparent that having both @MqRequest and @MqNotification is a serious footgun, and the two should be unified into a single API where the caller isn't burdened with knowledge of the remote end's implementation specifics.	2023-11-27 16:01:12 +01:00
Viktor Lofgren	88f49834fd	(docs) Update documentation	2023-10-27 12:45:39 +02:00
Viktor Lofgren	98d742d634	(actor) Code cleanup	2023-10-27 12:19:20 +02:00
Viktor Lofgren	f613f4f2df	(array) Fix spurious search results This was caused by a bug in the binary search algorithm causing it to sometimes return positive values when encoding a search miss. It was also necessary to get rid of the vestiges of the old LongArray and IntArray classes to make this fix doable.	2023-10-26 15:27:02 +02:00
Viktor Lofgren	a497e4c920	(crawler) Terminate crawler after a few hours of no progress	2023-10-26 12:49:28 +02:00
Viktor Lofgren	d7686b665e	Refactoring * Encyclopedia sideloader; permit providing base URL. * Storage base shows node id in GUI * ProcessLivenessMonitorActor restarts automatically * Clean-up of outbox code	2023-10-25 18:51:02 +02:00
Viktor Lofgren	2ed2f35a9b	(actor) Rewrite of the actor prototype class using record pattern matching	2023-10-23 10:18:20 +02:00
Viktor Lofgren	81dd3809e9	(*) WIP Add node affinity to EC_DOMAIN Very messy commit due to fractalline yak shaving	2023-10-19 17:48:34 +02:00
Viktor Lofgren	4baf9527d7	() WIP Control GUI redesign, executor-service, multi-node mq This turned out to be very difficult to do in small isolated steps. Design overhaul of the control gui using bootstrap * Move the actors out of control-service into to a new executor-service, that can be run on multiple nodes * Add node-affinity to message queue	2023-10-14 12:08:43 +02:00
Viktor Lofgren	3889c4bdd9	(refactor) Remove features-search and update documentation	2023-10-09 15:12:30 +02:00
Viktor	8e1abc3f10	(index-reverse) Parallel construction of the reverse indexes. (#52 ) * (index-reverse) Parallel construction of the reverse indexes. * (array) Remove wasteful calculation of numDistinct before merging two sorted arrays. * (index-reverse) Force changes to disk on close, reduce logging. * (index-reverse) Clean up merging process and add back logging * (run) Add a conservative default for INDEX_CONSTRUCTION_PROCESS_OPTS's parallelism as it eats a lot of RAM * (index-reverse) Better logging during processing * (array) 2GB+ compatible write() function * (array) 2GB+ compatible write() function * (index-reverse) We are logging like Bolsonaro and I will not have it. * (reverse-index) Self-diagnostics * (btree) Fix bug in btree reader to do with large data sizes	2023-10-07 10:00:00 +02:00
Viktor Lofgren	f6e9ef6de9	(array) Fix transferFrom() so it survives larger than 2 GB transfers	2023-10-04 13:57:36 +02:00
Viktor Lofgren	c51159672e	(build) Move unit test configuration to root build.gradle	2023-10-04 12:46:22 +02:00
Viktor Lofgren	54c8e13a68	(term-frequency-dict) Fix memory leak in TermFrequencyDict	2023-10-04 11:55:11 +02:00
Viktor Lofgren	40768e935b	(test) Removing /tmp-guardrails as it doesn't hold in CI	2023-10-02 16:52:59 +02:00
Viktor Lofgren	a433bbbe45	(converter) Fix rare sentence extractor bug It was caused by non-thread safe concurrent memory access in SentenceExtractor.	2023-09-24 19:39:48 +02:00
Viktor Lofgren	cd12f49fc0	(long-array) Return slices SegmentLongArray of itself for range() &c	2023-09-24 11:31:54 +02:00
Viktor Lofgren	d0aa754252	(long-array) Implement java.lang.foreign.Arena based lifecycle control for LongArray. Further de-ByteBuffer:ing of these classes is to be done, but this is the smallest most urgently needed benefit. This commit is a WIP but in a fully working state, pushing due to the importance of the changes to offer lifecycle control over mmaps.	2023-09-24 10:40:06 +02:00
Viktor Lofgren	dbe9235f3a	(*) Upgrade to JDK21 with preview enabled. ... also move some common configuration into the root build.gradle-file. Support for JDK21 in lombok is a bit sketchy at the moment, but it seems to work. This upgrade is kind of important as the new index construction really benefits from Arena based lifecycle control over off-heap memory.	2023-09-24 10:38:59 +02:00
Viktor Lofgren	4aa47e87f2	(blocking-thread-pool) Add isTerminated convenience function	2023-09-21 12:47:41 +02:00
Viktor Lofgren	d895f83520	(blocking-thread-pool) Move DumbThreadPool to its own micro-library Also rename it to SimpleBlockingThreadPool.	2023-09-20 10:11:49 +02:00
Viktor Lofgren	04212b2cef	(btree) Add more consistent asserts on sortedness	2023-09-01 15:45:02 +02:00
Viktor Lofgren	f74b9df0a7	(array) Don't use paging arrays when mapping small files for writing	2023-08-31 20:15:10 +02:00
Viktor Lofgren	f321fa5ad3	(array) Override to Paging...Array$range() This is a big performance boost in array.range().get(). Without an override, each access will go through pages[page].get(...) for each get()-operation. This adds up very quickly. BTreeReader does a bunch of get():s on a range()'d array during traversal in the queryData... methods.	2023-08-31 13:52:29 +02:00
Viktor Lofgren	ffa0366deb	(minor) Fix typo in ActorStateMachine's logging	2023-08-28 16:11:52 +02:00
Viktor Lofgren	3101b74580	(index) Move to a lexicon-free index design This is a system-wide change. The index used to have a lexicon, mapping words to wordIds using a large in-memory hash table. This made index-construction easier, but it also added a fairly significant RAM penalty to both the index service and the loader. The new design moves to 64 bit word identifiers calculated using the murmur hash of the keyword, and an index construction based on merging smaller indices. It also became necessary half-way through to upgrade guice as its error reporting wasn't quite compatible with JDK20.	2023-08-28 14:02:23 +02:00
Viktor Lofgren	ebc84c22fb	Upgrade antique lombok plugin This permits tests to run on JDK20 environments.	2023-08-23 14:34:32 +00:00
Viktor Lofgren	aa0d256d6a	Upgrade code to Java 20. * Change language version * Upgrade Lombok to a JDK20 compatible version	2023-08-23 13:37:49 +00:00
Viktor Lofgren	fca62f261e	(mq) Down-tune polling intervals in MQ Polling 10 times a second across dozens of queues is a bit too aggressive and wasteful.	2023-08-22 11:49:30 +02:00
Viktor Lofgren	46d761f34f	(language) fasttext based language filter	2023-08-16 15:48:12 +02:00
Viktor Lofgren	4404ad98ae	(mq) Fix missing @Inject that broke everything in control-service	2023-08-15 11:22:12 +02:00
Viktor Lofgren	e7192a9cad	(mq) Refactor mq and actor library and move it to libraries out of common	2023-08-15 10:53:23 +02:00
Viktor Lofgren	d6b07e4d01	(controller) Improve the storage interface	2023-07-21 19:56:16 +02:00
Viktor Lofgren	995657c6ce	(big-string) Make big-string disable:able	2023-07-21 19:50:35 +02:00
Viktor	cbbf60a599	Better fingerprinting (#35 ) * Better fingerprinting for server tech * Many more features in FeatureExtractor * Blog specialization * SiteType table	2023-07-10 18:58:43 +02:00
Viktor Lofgren	77f2ca51af	Optimize SentenceExtractor. Remove String pool because it's not doing much. Break out constant. Use a shared RdrPosTagger.	2023-06-19 17:58:19 +02:00
Viktor Lofgren	ffcbc6c1c9	Reduce the odds of re-allocation by AsciiFlattener	2023-06-19 17:58:19 +02:00
Viktor Lofgren	e4372289a5	Use fixed buffers for BigString compression and decompression to reduce GC churn. fixup! Use fixed buffers for BigString compression and decompression to reduce GC churn.	2023-06-19 17:58:19 +02:00
Viktor Lofgren	d82a858491	Don't consider slash to be a sentence separator.	2023-05-31 16:54:30 +02:00
Viktor Lofgren	4e9e79454f	Fix broken transformation functions in the PagingArray classes.	2023-05-28 13:31:05 +02:00
Viktor Lofgren	b0bc07b4e7	Insertion sort was super busted I don't even know how it worked	2023-05-28 12:17:50 +02:00
Viktor Lofgren	6814c90625	Fix N-width sorting bug	2023-05-28 11:57:06 +02:00
Viktor	96bac70b85	Tools for merging sorted lists, and merging btrees. (#14 ) * Utilities for merging BTrees of entity size 1 and 2. * Isolate and clean up sorting algorithms. * Functions for keeping distinct items in a LongArray	2023-04-20 15:28:09 +02:00
Viktor	a278fc6296	Increase search result relevance (#8 ) * Increase accuracy of the position bits. * Increase their width to 56. * Use a rolling position scheme for bits 16-56 to increase the average accuracy. * Result ranking overhaul * Optimized queries * BM25 in the index service's ranking * Make gui less jank * Javadocs for ranking parameters.	2023-04-07 20:18:08 +02:00
Viktor Lofgren	32b9c2e671	Fix SentenceExtractor jank	2023-03-30 15:45:04 +02:00
Viktor Lofgren	0fcb2b534c	Polish Names	2023-03-29 16:51:47 +02:00
Viktor Lofgren	3464ca514b	Fix typeahead suggestions	2023-03-25 10:20:52 +01:00
Viktor Lofgren	611ba2d35a	Break apart WordPatterns class	2023-03-22 15:10:17 +01:00

1 2

66 Commits