CatgirlIntelligenceAgency

Author	SHA1	Message	Date
Viktor Lofgren	91dd45cf64	(search) IP and IP geolocation in site info view This commit also fixes a bug in the loader where the IP field wouldn't always populate as intended, and refactors the DomainInformationService to use significantly fewer SQL queries.	2023-12-09 20:06:55 +01:00
Viktor Lofgren	1dafa0c74d	(mqapi/control) Repair repartition endpoint, deprecate notify endpoints. The repartition endpoint was mis-addressing its mqapi notifications, omitting the proper nodeId. In fixing this, it became apparent that having both @MqRequest and @MqNotification is a serious footgun, and the two should be unified into a single API where the caller isn't burdened with knowledge of the remote end's implementation specifics.	2023-11-27 16:01:12 +01:00
Viktor Lofgren	09917837d0	(process) Ensure construction exceptions are logged Wrapping these exceptions in a try-catch and logging them with slf4j will ensure they end up in the process logs. The way it worked using the default exception handler, they'd print on console (which nothing captures!), leading to a very annoying debugging experience.	2023-11-22 18:32:06 +01:00
Viktor Lofgren	f58a9f46be	(loader) Don't truncate the entire links table on load This behavior is an old vestige from the days of only having a single loader process. We'd truncate the links table because doing inserts/updates was too slow. This was also important because we had 32 bit ID, and there's a lot of links between domains to go around... Instead we delete the rows associated with the current node with a stored procedure PURGE_LINKS_TABLE. We also update the PRIMARY KEY to a BIGINT. We'll need to load the data in excess of billion times to hit an ID rollover, so it'll be fine.	2023-11-16 10:30:12 +01:00
Viktor Lofgren	d7686b665e	Refactoring * Encyclopedia sideloader; permit providing base URL. * Storage base shows node id in GUI * ProcessLivenessMonitorActor restarts automatically * Clean-up of outbox code	2023-10-25 18:51:02 +02:00
Viktor Lofgren	1d75b974b5	(loader bugfix) Set DOMAIN_METADATA appropriately	2023-10-20 13:03:27 +02:00
Viktor Lofgren	81dd3809e9	(*) WIP Add node affinity to EC_DOMAIN Very messy commit due to fractalline yak shaving	2023-10-19 17:48:34 +02:00
Viktor Lofgren	4baf9527d7	() WIP Control GUI redesign, executor-service, multi-node mq This turned out to be very difficult to do in small isolated steps. Design overhaul of the control gui using bootstrap * Move the actors out of control-service into to a new executor-service, that can be run on multiple nodes * Add node-affinity to message queue	2023-10-14 12:08:43 +02:00
Viktor Lofgren	199c459697	(*) Add node-affinity to services, processes and file storage.	2023-10-10 12:32:22 +02:00
Viktor Lofgren	3889c4bdd9	(refactor) Remove features-search and update documentation	2023-10-09 15:12:30 +02:00
Viktor Lofgren	5dd55c7cad	(refactor) Rename satellite services to application services This is a better descriptor, since they now all implement different applications on top of the core services' APIs.	2023-10-09 13:45:45 +02:00
Viktor Lofgren	c0e61d4c87	(refactor) Move search service into services-satellite	2023-10-09 13:40:01 +02:00
Viktor Lofgren	c51159672e	(build) Move unit test configuration to root build.gradle	2023-10-04 12:46:22 +02:00
Viktor Lofgren	dbe9235f3a	(*) Upgrade to JDK21 with preview enabled. ... also move some common configuration into the root build.gradle-file. Support for JDK21 in lombok is a bit sketchy at the moment, but it seems to work. This upgrade is kind of important as the new index construction really benefits from Arena based lifecycle control over off-heap memory.	2023-09-24 10:38:59 +02:00
Viktor Lofgren	f809d22fc6	(loader) Support simultaneous loading of multiple processed data sets	2023-09-22 13:14:58 +02:00
Viktor Lofgren	eaeb23d41e	(refactor) Remove converting-model package completely	2023-09-14 11:21:44 +02:00
Viktor Lofgren	c71f6ad417	(converter) Add heartbeats to the loader processes and execute the tasks in parallel for a ~2X speedup	2023-09-14 10:11:57 +02:00
Viktor Lofgren	24b4606f96	(converter,loader) Converter outputs parquet files instead of compressed json.	2023-09-13 16:13:41 +02:00
Viktor Lofgren	9e185e80ce	(control-service) Add timestamp to file storages.	2023-09-02 14:01:04 +02:00
Viktor Lofgren	5f427d2b4c	(keywords) Clean up leaky abstractions, clean up tests	2023-09-01 13:52:00 +02:00
Viktor Lofgren	320dad7f1a	(index journal) Fix leaky abstraction in IndexJournalReader. The caller shouldn't be required to know the on-disk layout of the file to make use of the data in a performant way.	2023-09-01 11:18:13 +02:00
Viktor Lofgren	a6f1335375	(loader) Fix bugfix where the loader would omit some meta and words.	2023-08-31 17:48:43 +02:00
Viktor Lofgren	dd593c292c	(loader) Minor optimizations and bugfixes. * Reduce memory churn in LoaderIndexJournalWriter, fix bug with keyword mappings as well * Remove remains of OldDomains * Ensure LOADER_PROCESS_OPTS gets fed to the processes * LinkdbStatusWriter won't execute batch after each added item post 100 items	2023-08-29 15:37:52 +02:00
Viktor Lofgren	39c1857c61	(heartbeat, reverse-index) Better heartbeat mocking, improved heartbeats for reverse index construction.	2023-08-29 13:07:55 +02:00
Viktor Lofgren	ba4513e82c	(loader) Revert accidental experimental changes that slipped by in an earlier commit	2023-08-28 19:54:56 +02:00
Viktor Lofgren	3101b74580	(index) Move to a lexicon-free index design This is a system-wide change. The index used to have a lexicon, mapping words to wordIds using a large in-memory hash table. This made index-construction easier, but it also added a fairly significant RAM penalty to both the index service and the loader. The new design moves to 64 bit word identifiers calculated using the murmur hash of the keyword, and an index construction based on merging smaller indices. It also became necessary half-way through to upgrade guice as its error reporting wasn't quite compatible with JDK20.	2023-08-28 14:02:23 +02:00
Viktor Lofgren	e710e057e2	(db) Remove EC_URL and EC_PAGE_DATA from mariadb database	2023-08-25 13:45:03 +02:00
Viktor Lofgren	460998d512	(index) Move index construction to separate process. This provides a much cleaner separation of concerns, and makes it possible to get rid of a lot of the gunkier parts of the index service. It will also permit lowering the Xmx on the index service a fair bit, so we can get CompressedOOps again :D	2023-08-25 12:52:54 +02:00
Viktor Lofgren	1e6800565a	(system) Remove EdgeId<T> and similar objects They seemed like a good idea at the time, but in practice they're wasting resources and not really providing the clarity I had hoped.	2023-08-24 17:46:02 +02:00
Viktor Lofgren	c909120ae1	(search) Basic working integration of linkdb in search service	2023-08-24 17:24:56 +02:00
Viktor Lofgren	6a04cdfddf	(loader) Implement new linkdb in loader Deprecate the LoadUrl instruction entirely. We no longer need to be told upfront about which URLs to expect, as IDs are generated from the domain id and document ordinal. For now, we no longer store new URLs in different domains. We need to re-implement this somehow, probably in a different job or a as a different output.	2023-08-24 13:07:54 +02:00
Viktor Lofgren	ebc84c22fb	Upgrade antique lombok plugin This permits tests to run on JDK20 environments.	2023-08-23 14:34:32 +00:00
Viktor Lofgren	aa0d256d6a	Upgrade code to Java 20. * Change language version * Upgrade Lombok to a JDK20 compatible version	2023-08-23 13:37:49 +00:00
Viktor Lofgren	ca12dd59f7	(loader) Fix Cleaner resource leak Apparently Cleaners have an associated native thread, so the way to use them is to have a single static cleaner.	2023-08-22 18:05:00 +02:00
Viktor Lofgren	46409c4c2d	(loader) Use the correct interface for InstructionCounter	2023-08-22 11:11:36 +02:00
Viktor Lofgren	704de50a9b	(forward-index, valuator) HTML features in valuator Put it in the forward index for easy access during index-side valuation.	2023-08-18 11:54:56 +02:00
Viktor Lofgren	e7192a9cad	(mq) Refactor mq and actor library and move it to libraries out of common	2023-08-15 10:53:23 +02:00
Viktor Lofgren	4ab1cd9502	(*) last touches	2023-08-07 12:57:44 +02:00
Viktor Lofgren	58556af6c7	(db) Use flwyay for database migrations.	2023-08-01 17:08:42 +02:00
Viktor Lofgren	ea66195b97	(loader) Optimize loader by using zstd's direct streaming writer and the Murmur3_128 string hash	2023-08-01 15:02:13 +02:00
Viktor Lofgren	8f0cbf267b	(loader) Perform instruction reads in a separate thread for extra vroom vroom	2023-07-31 14:24:08 +02:00
Viktor Lofgren	2f8488610a	(loader) Fix bug where trailing deferred domain meta inserts weren't executed	2023-07-31 14:23:23 +02:00
Viktor Lofgren	730e8f74e4	(crawler) Even more memory optimizations. * Fix minor resource leak in zstd streams * Use pools for zstd streams * Reduce the SSL session cache size	2023-07-30 14:19:55 +02:00
Viktor Lofgren	01476577b8	(loader) Speed up loading back to original speeds with a cascading DELETE FROM EC_URL rather than EC_PAGE_DATA. * Also clean up code and have proper rollbacks for transactions.	2023-07-28 22:00:07 +02:00
Viktor Lofgren	f11103d31d	(WIP) Make it possible to sideload encyclopedia data. This is mostly a pilot track for sideloading other large websites. Also change coverter to produce a more compact output (java serialization instead of json).	2023-07-28 18:14:43 +02:00
Viktor Lofgren	fd44e09ebd	(loader) Don't delete the entire link database when the loader runs	2023-07-24 18:37:35 +02:00
Viktor Lofgren	d7ab21fe34	(*) Refactor Control Service and processes	2023-07-17 21:20:31 +02:00
Viktor Lofgren	bca4bbb6c8	(*) Refactor MQ and MQSM	2023-07-17 13:57:32 +02:00
Viktor Lofgren	e618aa34e9	(control) Name change process->fsm, new fsm:s * FSM for spawning processes when messages appear for them * FSM for removing data flagged for purging	2023-07-17 12:27:27 +02:00
Viktor Lofgren	8b74e3aa0d	(*) File Storage WIP	2023-07-14 17:08:10 +02:00

1 2

60 Commits