CatgirlIntelligenceAgency

Author	SHA1	Message	Date
Viktor Lofgren	d7bd540683	(*) Replace the ip2location IP geolocation data with ASN information from apnic.net. Doesn't really make sense to use ip2location as a middle man for information that is already freely available...	2023-12-16 21:55:04 +01:00
Viktor Lofgren	722b56c8ca	(index) Fix rare bug in the index-switching logic This is caused by a resource contention with the query code. The proper way to fix this is to use some form of synchronization, but that will slow the code down. So we just hammer it a few times and let the GC deal with the problem if it fails. Not optimal, but fast.	2023-12-16 18:57:35 +01:00
Viktor Lofgren	f3f12058dc	(assistant) Fix logic error in filtering related domains	2023-12-16 18:45:53 +01:00
Viktor Lofgren	3da38d0483	(assistant) Fix logic error in filtering related domains	2023-12-16 18:44:25 +01:00
Viktor Lofgren	e13fa25e11	(assistant) Clean up the site info related domains view by filtering viable domains	2023-12-16 18:37:09 +01:00
Viktor Lofgren	34d4834ff6	(assistant) Clean up the site info related domains view by filtering viable domains	2023-12-16 18:27:24 +01:00
Viktor Lofgren	440e097d78	(crawler) WIP integration of WARC files into the crawler and converter process. This commit is in a pretty rough state. It refactors the crawler fairly significantly to offer better separation of concerns. It replaces the zstd compressed json files used to store crawl data with WARC files entirely, and the converter is modified to be able to consume this data. This works, -ish. There appears to be some bug relating to reading robots.txt, and the X-Robots-Tag header is no longer processed either. A problem is that the WARC files are a bit too large. It will probably be likely to introduce a new format to store the crawl data long term, something like parquet; and use WARCs for intermediate storage to enable the crawler to be restarted without needing a recrawl.	2023-12-13 15:33:42 +01:00
Viktor Lofgren	45987a1d98	Merge branch 'master' into warc	2023-12-11 14:32:35 +01:00
Viktor Lofgren	f655ec5a5c	(*) Refactor GeoIP-related code In this commit, GeoIP-related classes are refactored and relocated to a common library as they are shared across multiple services. The crawler is refactored to enable the GeoIpBlocklist to use the new GeoIpDictionary as the base of its decisions. The converter is modified ot query this data to add a geoip:-keyword to documents to permit limiting a search to the country of the hosting server. The commit also adds due BY-SA attribution in the search engine footer for the source of the IP geolocation data.	2023-12-10 17:30:43 +01:00
Viktor Lofgren	91dd45cf64	(search) IP and IP geolocation in site info view This commit also fixes a bug in the loader where the IP field wouldn't always populate as intended, and refactors the DomainInformationService to use significantly fewer SQL queries.	2023-12-09 20:06:55 +01:00
Viktor Lofgren	e3ebb0c5bb	(*) Rename the search filter 'RETRO' into 'POPULAR' This will make the terminology more consistent between the GUI and the code. The rankings yaml still uses 'retro' though, for to retain compatibility.	2023-12-09 20:06:54 +01:00
Viktor Lofgren	8ef34883a8	(search) Move site information out of the search service and into assistant. This reduces the impact of restarting the search service, as the site information takes a few minutes to load during which it's not available. It also permits exposing this information via API in the future if there is interest in this. The assistant service was also modified to do a late load of the suggestions trie, as this is a major contributor to its start-up time. Finally, some changes were made to the client library, a new get() method was added that takes a TypeToken to allow deserialization of generics such as List<Foo>, and the scheduler was also modified to use virtual threads.	2023-12-09 16:30:06 +01:00
Viktor Lofgren	eccb12b366	(control) Fix spurious state detection in control-side actors A race condition was found where precession actors would sometimes skip a step, because when invoking ExecutorRemoteActor.getState(), it would get the last 'OK' actor state from a previous run of the actor! To avoid this, the trigger method was changed from returning a boolean to the message ID, negative if an error occurred, to be passed to getState to select only messages that pertain to the present or future runs.	2023-12-09 12:50:05 +01:00
Viktor Lofgren	cc813a5624	(convert) Add basic support for Warc file sideloading This update includes the integration of the jwarc library and implements support for Warc file sideloading, as a first trial integration with this library.	2023-12-06 18:43:55 +01:00
Viktor Lofgren	01621c6344	(renderer) Make helpers configurable on a by-service basis.	2023-12-02 17:06:40 +01:00
Viktor Lofgren	c7934342a6	(control) Automatic recrawl	2023-12-02 17:06:24 +01:00
Viktor Lofgren	f5c324c06b	(minor) Fix broken test	2023-12-01 17:44:39 +01:00
Viktor Lofgren	67a1e1c874	(control) GUI for triggering control-side actors	2023-11-29 15:31:14 +01:00
Viktor Lofgren	4155fbe94c	(control) Reprocess-all actor	2023-11-28 17:58:48 +01:00
Viktor Lofgren	347fe6b7be	(control) Reindex-all actor	2023-11-28 16:41:09 +01:00
Viktor Lofgren	ff3ceb981e	(control) Button for removing a stale 'NEW' status If a process is violently terminated, the associated file storage may get stuck in the ephemeral 'NEW' state, preventing future operations on the associated data. To remedy this without having to dig through the database, a button was added to reset the state. It's a band-aid, but the situation is rare enough that I think it's fine.	2023-11-28 15:18:24 +01:00
Viktor Lofgren	1dafa0c74d	(mqapi/control) Repair repartition endpoint, deprecate notify endpoints. The repartition endpoint was mis-addressing its mqapi notifications, omitting the proper nodeId. In fixing this, it became apparent that having both @MqRequest and @MqNotification is a serious footgun, and the two should be unified into a single API where the caller isn't burdened with knowledge of the remote end's implementation specifics.	2023-11-27 16:01:12 +01:00
Viktor Lofgren	dd9406d0ac	(control) Make storage type tabs consistent This had fallen off in the Create New Specification view, it lacked Exports.	2023-11-17 11:26:45 +01:00
Viktor Lofgren	e9a01caa5c	(index) Fix broken metrics	2023-11-11 12:53:47 +01:00
Viktor Lofgren	858357a246	(metrics) Get prometheus up out of disrepair * Fix bad labels * Add nodeId where appropriate * Hopefully fix histogram buckets for index query times	2023-11-08 14:01:28 +01:00
Viktor Lofgren	0152004c42	Initial Commit Anchor Tags * Added new (optional) model file in $WMSA_HOME/data/atags.parquet * Converter gets a component for creating a projection of its domains onto the full atags parquet file * New WordFlag ExternalLink * These terms are also for now flagged as title words * Fixed a bug where Title words aliased with UrlDomain words * Fixed a bug in the encyclopedia sideloader that gave everything too high topology ranking	2023-11-04 14:24:17 +01:00
Viktor Lofgren	8e9698c9a0	(control/search) Add ability to suggest removing a site from random exploration This is what most complaints have been about.	2023-11-02 15:29:49 +01:00
Viktor Lofgren	3047e2dd7c	(screenshot-capture-tool) Make screenshot-capture-tool cooperate with docker	2023-11-01 16:38:55 +01:00
Viktor Lofgren	a8b9d21f2d	(executor) Refine atag export logic * Remove obviously uninteresting tags * Omit URL schema for more sensible sorting * Change the column order to put the source domain last	2023-11-01 13:23:14 +01:00
Viktor Lofgren	c77a5b7cb6	(control) GUI for atags export	2023-10-31 17:55:47 +01:00
Viktor Lofgren	23f2068e33	(executor) Actor for exporting anchor tag data from crawl data	2023-10-31 17:32:34 +01:00
Viktor Lofgren	ffadfb4149	(control) Use a partial template for the storage types tabs.	2023-10-31 17:12:14 +01:00
Viktor Lofgren	b7e38cfbae	(control) Add exports view	2023-10-31 17:08:48 +01:00
Viktor Lofgren	659743b39c	(executor) Export Data actor allocates its own storage	2023-10-31 17:04:07 +01:00
Viktor Lofgren	69758c5859	(control) Nicer redirects acknowledging actions	2023-10-31 16:26:29 +01:00
Viktor Lofgren	2871a326e6	(ctrl/exe) Clean up UX and code	2023-10-29 14:09:39 +01:00
Viktor Lofgren	abb42f0f36	(crawler) Fix bug in SQL statement Arguments were in the wrong order in inserting fetching sites submitted to be crawled	2023-10-29 13:19:17 +01:00
Viktor Lofgren	88f49834fd	(docs) Update documentation	2023-10-27 12:45:39 +02:00
Viktor Lofgren	c7cb6664b4	(control) Indicate missing services with danger-color instead of having a distracting and constantly updating last-seen number	2023-10-26 18:05:22 +02:00
Viktor Lofgren	79adba9284	(index) Fix bug in dealing with quoted search terms	2023-10-26 16:28:23 +02:00
Viktor Lofgren	f613f4f2df	(array) Fix spurious search results This was caused by a bug in the binary search algorithm causing it to sometimes return positive values when encoding a search miss. It was also necessary to get rid of the vestiges of the old LongArray and IntArray classes to make this fix doable.	2023-10-26 15:27:02 +02:00
Viktor Lofgren	abbadc92a0	(exdecutor) Prevent TriggerAdjacencyCalculationActor from showing up in the actions tab when it isn't running	2023-10-25 21:25:07 +02:00
Viktor Lofgren	97fcbdd6d9	(control) Move storage actions into the actions tab * Also disable annoying CSS animations	2023-10-25 21:23:56 +02:00
Viktor Lofgren	d7686b665e	Refactoring * Encyclopedia sideloader; permit providing base URL. * Storage base shows node id in GUI * ProcessLivenessMonitorActor restarts automatically * Clean-up of outbox code	2023-10-25 18:51:02 +02:00
Viktor Lofgren	84cdac83d6	(control) Move message queue monitor to control	2023-10-24 16:44:28 +02:00
Viktor Lofgren	313cc2965c	(index-creation) Print whether full or prio is created Previous state of saying reverse index for both was pretty confusing.	2023-10-24 16:23:10 +02:00
Viktor Lofgren	95f74c5ea7	(control) Filter out heartbeats that are stopped	2023-10-24 16:09:28 +02:00
Viktor Lofgren	0406e76889	(api) Remove logging cruft	2023-10-24 13:39:05 +02:00
Viktor Lofgren	c2b28c0f8d	(api) Trial streaming API	2023-10-24 13:26:46 +02:00
Viktor Lofgren	a860f8f1a8	(index/qs) GRPC API for better query peformance	2023-10-24 11:38:07 +02:00
Viktor Lofgren	2ed2f35a9b	(actor) Rewrite of the actor prototype class using record pattern matching	2023-10-23 10:18:20 +02:00
Viktor Lofgren	119151cad3	(converter) Separtion of concerns	2023-10-22 14:35:33 +02:00
Viktor Lofgren	758f9b5aa5	(converter) Get UUID pips out of the models Rendering concerns shouldn't be in the models, it's poor separation of concerns and very difficult to follow.	2023-10-22 14:24:52 +02:00
Viktor Lofgren	eb4158df0b	(control) Fix start/stop FSM endpoints	2023-10-22 14:03:09 +02:00
Viktor Lofgren	12fda1a36b	(control) Temporarily re-writing the data balancer to get it to work in prod Need to clean this up later.	2023-10-22 14:03:09 +02:00
Viktor Lofgren	e927f99777	(control) JSON serializes Map<Integer> to Map<Double> and Java gets confused	2023-10-21 16:24:20 +02:00
Viktor Lofgren	044bcf55bd	(control) Fix SQL in rebalance actor	2023-10-21 16:13:37 +02:00
Viktor Lofgren	e475af9f49	(control) Initialize controlActorService	2023-10-21 16:06:53 +02:00
Viktor Lofgren	c6abcd91fa	(control) Better use of FS states, fix bug with start/stop actors	2023-10-20 16:37:49 +02:00
Viktor Lofgren	d76d926c38	(control/executor) Add new configuration options for node It's now possible to configure prod instance to not retain processed data.	2023-10-20 14:05:19 +02:00
Viktor Lofgren	2b3c167845	(controller) Additional configuration options for node	2023-10-20 13:13:36 +02:00
Viktor Lofgren	584bb3a648	(fs) interface cleanup	2023-10-20 12:24:18 +02:00
Viktor Lofgren	7b5ec6b98f	(executor-service) Embed dist/ in executor-service's docker image	2023-10-19 17:48:34 +02:00
Viktor Lofgren	23526f6d1a	(executor) Executor service now pulls DomainType list for CRAWL on "recrawl" This is an automatic integration with the submit-site repo on github and also crawl-queue.	2023-10-19 17:48:34 +02:00
Viktor Lofgren	809b3ee023	(control) Update GUI for crawl specs. They are now less important than they were before.	2023-10-19 17:48:34 +02:00
Viktor Lofgren	23f0c79fba	(control) GUI for data sets/domain types.	2023-10-19 17:48:34 +02:00
Viktor Lofgren	81dd3809e9	(*) WIP Add node affinity to EC_DOMAIN Very messy commit due to fractalline yak shaving	2023-10-19 17:48:34 +02:00
Viktor Lofgren	978550f809	(executor-service) Retire features-convert and move the corresponding packages into the executor service.	2023-10-16 15:43:46 +02:00
Viktor Lofgren	84fea0fd05	(node) Nodes auto-start their monitor actors.	2023-10-16 15:33:22 +02:00
Viktor Lofgren	2df3e0f881	(node) Nodes auto-configure on start-up instead of requiring manual configuration.	2023-10-16 14:46:35 +02:00
Viktor Lofgren	ede5d1f890	(actor) Give process spawners more easily recognizable names.	2023-10-16 14:19:00 +02:00
Viktor Lofgren	39911e3acd	(control) Fix incorrect storage base and clean up GUI for data	2023-10-16 13:30:26 +02:00
Viktor Lofgren	8dafd13cd7	(client) Fix executor tests	2023-10-16 12:02:57 +02:00
Viktor Lofgren	c245f7ce3a	(control) Bootstrapify review-domains and search-to-ban views.	2023-10-15 22:04:23 +02:00
Viktor Lofgren	607d647483	(control) Remove services listing view	2023-10-15 21:48:55 +02:00
Viktor Lofgren	9a38a455c9	(control/exec) File listings in control GUI	2023-10-15 19:15:44 +02:00
Viktor Lofgren	16e0738731	(*) Get multi-node routing working.	2023-10-15 18:38:30 +02:00
Viktor Lofgren	eacbf87979	(control) New list and form for index nodes.	2023-10-14 21:46:52 +02:00
Viktor Lofgren	108b4cb648	(service) Keep disabled multi-noded services dormant when they are configured to be disabled.	2023-10-14 20:58:55 +02:00
Viktor Lofgren	6308a8dfcd	(control) Node configuration	2023-10-14 16:47:52 +02:00
Viktor Lofgren	4baf9527d7	() WIP Control GUI redesign, executor-service, multi-node mq This turned out to be very difficult to do in small isolated steps. Design overhaul of the control gui using bootstrap * Move the actors out of control-service into to a new executor-service, that can be run on multiple nodes * Add node-affinity to message queue	2023-10-14 12:08:43 +02:00
Viktor Lofgren	199c459697	(*) Add node-affinity to services, processes and file storage.	2023-10-10 12:32:22 +02:00
Viktor Lofgren	61288c5e68	(service, client) First steps towards multiple nodedness	2023-10-09 22:13:27 +02:00
Viktor Lofgren	6319b8ef51	(api-service) Improved testability, always set content type to application/json	2023-10-09 15:39:34 +02:00
Viktor Lofgren	397a85eaa4	(query-service) Apply blacklisting to search results	2023-10-09 15:18:53 +02:00
Viktor Lofgren	3889c4bdd9	(refactor) Remove features-search and update documentation	2023-10-09 15:12:30 +02:00
Viktor Lofgren	c899f1cb85	(docs) Update documentation to reflect new query service	2023-10-09 14:56:59 +02:00
Viktor Lofgren	d8956c51d0	(refactor) Remove api:search-api Application services should not have an API, but purely act as clients to the core services (which should always have an API).	2023-10-09 14:42:33 +02:00
Viktor Lofgren	c0e61d4c87	(refactor) Move search service into services-satellite	2023-10-09 13:40:01 +02:00
Viktor Lofgren	97e17282ab	(query-service) Move query parsing from search-service to the new query service.	2023-10-09 13:27:44 +02:00
Viktor Lofgren	94c882af7d	(query-service) Provide delegate of IndexApi's query functionality. This is an intermediate step in the process of introducing the query-service as a proxy between search and index.	2023-10-08 22:22:26 +02:00
Viktor Lofgren	89c6d85f2f	(query-service) Create new empty 'query-service' service	2023-10-08 17:31:50 +02:00
Viktor Lofgren	cf366c602f	(search) Refactor SearchQueryIndexService in preparation for feature extraction. Prefer working on DecoratedSearchResultItem in favor of UrlDetails.	2023-10-08 17:15:41 +02:00
Viktor Lofgren	77ccab7d80	(index) Move linkdb to index from search. This makes index complete in the sense that you can deploy an index instance and build a complete separate application on top of it, without having to go through the Marginalia-laden search service.	2023-10-08 16:48:35 +02:00
Viktor Lofgren	f51ba63742	(search) Remove dead file	2023-10-07 21:05:06 +02:00
Viktor Lofgren	9044518be5	(search) Fix broken link to git repo	2023-10-07 19:43:22 +02:00
Viktor Lofgren	9e0367eef4	(search) Filter blacklisted items in API query service as well	2023-10-07 16:16:04 +02:00
Viktor Lofgren	235bb6c1b9	(control) Administrative QOL improvement, GUI for banning spam	2023-10-07 15:45:50 +02:00
Viktor Lofgren	49344d7ea8	(control) Administrative QOL improvement, GUI for banning spam	2023-10-07 15:43:18 +02:00
Viktor Lofgren	1b418d77ff	(search) We got some new IP ranges to work with for the crawler	2023-10-07 13:41:55 +02:00
Viktor Lofgren	80cc302627	(search) We can't in claim to be on PC hardware anymore...	2023-10-07 11:49:29 +02:00
Viktor	8e1abc3f10	(index-reverse) Parallel construction of the reverse indexes. (#52 ) * (index-reverse) Parallel construction of the reverse indexes. * (array) Remove wasteful calculation of numDistinct before merging two sorted arrays. * (index-reverse) Force changes to disk on close, reduce logging. * (index-reverse) Clean up merging process and add back logging * (run) Add a conservative default for INDEX_CONSTRUCTION_PROCESS_OPTS's parallelism as it eats a lot of RAM * (index-reverse) Better logging during processing * (array) 2GB+ compatible write() function * (array) 2GB+ compatible write() function * (index-reverse) We are logging like Bolsonaro and I will not have it. * (reverse-index) Self-diagnostics * (btree) Fix bug in btree reader to do with large data sizes	2023-10-07 10:00:00 +02:00
Viktor Lofgren	c51159672e	(build) Move unit test configuration to root build.gradle	2023-10-04 12:46:22 +02:00
Viktor Lofgren	405300b4b2	(control) Fix bug where finishing one process ad hoc task would remove all other tasks from the db	2023-10-04 11:44:31 +02:00
Viktor Lofgren	40768e935b	(test) Removing /tmp-guardrails as it doesn't hold in CI	2023-10-02 16:52:59 +02:00
Viktor Lofgren	d160954080	(index) Two useful debug endpoints	2023-09-24 19:39:48 +02:00
Viktor Lofgren	14372e0ef0	(index) Slightly reduce alloc churn	2023-09-24 19:36:14 +02:00
Viktor Lofgren	03bffa27ac	(search) Add combined id to the search result HTML	2023-09-24 19:34:35 +02:00
Viktor Lofgren	028b5a4f0d	(minor performance) Reduce GC churn in index	2023-09-24 12:12:08 +02:00
Viktor Lofgren	1bd146fb8e	(minor) Remove dead code	2023-09-24 10:55:20 +02:00
Viktor Lofgren	5f6c3da7a4	(index) Add close methods on the index readers so they clean up their mmaps	2023-09-24 10:54:23 +02:00
Viktor Lofgren	dbe9235f3a	(*) Upgrade to JDK21 with preview enabled. ... also move some common configuration into the root build.gradle-file. Support for JDK21 in lombok is a bit sketchy at the moment, but it seems to work. This upgrade is kind of important as the new index construction really benefits from Arena based lifecycle control over off-heap memory.	2023-09-24 10:38:59 +02:00
Viktor Lofgren	d78569986b	(backups) Fix bug where backup service would zero the linkdb when restoring.	2023-09-22 18:34:34 +02:00
Viktor Lofgren	95323e6caa	(backups) Support restore multi-source load data	2023-09-22 18:34:17 +02:00
Viktor Lofgren	f809d22fc6	(loader) Support simultaneous loading of multiple processed data sets	2023-09-22 13:14:58 +02:00
Viktor Lofgren	70aa04c047	(converter, stackexchange-xml) Add the ability to sideload stackexchange data	2023-09-21 12:48:33 +02:00
Viktor Lofgren	f8050816ac	(search) Don't run LSH deduplication on details with zero lsh to support not calculating this hash.	2023-09-21 12:47:02 +02:00
Viktor Lofgren	9b385ec7cc	(converter) Make it possible to sideload documents from a directory tree	2023-09-17 14:35:06 +02:00
Viktor Lofgren	5c040f7a46	(crawl-spec) Parquetify crawl spec * Crawl-specs are now parquet files * Deprecate the crawl-job-extractor tool	2023-09-17 09:41:34 +02:00
Viktor Lofgren	5e5aaf9a7e	(converter, control) Re-enable sideloading encyclopedia data	2023-09-14 12:12:07 +02:00
Viktor Lofgren	07d7507ac6	(control-service) Move Actions up in storage-details Papercut fix. If a file storage area has a lot of files, you have to scroll down a long way to get to the actions otherwise.	2023-09-02 15:41:55 +02:00
Viktor Lofgren	9e185e80ce	(control-service) Add timestamp to file storages.	2023-09-02 14:01:04 +02:00
Viktor Lofgren	d31d8ec5b0	(index) Log keyword ids on hex format	2023-09-01 15:40:24 +02:00
Viktor Lofgren	2b00cd632d	(process) Propagate environment JVM params to the index constructor	2023-09-01 15:39:42 +02:00
Viktor Lofgren	764e7d1315	(index) Add more comprehensive integration tests for the index service.	2023-08-30 10:37:24 +02:00
Viktor Lofgren	e4d7958379	(control) ProcessLivenessMonitorActor shouldn't reap tasks based on service instance liveness	2023-08-29 18:19:04 +02:00
Viktor Lofgren	3f288e264b	(minor) Clean up dead endpoints	2023-08-29 17:04:54 +02:00
Viktor Lofgren	dd593c292c	(loader) Minor optimizations and bugfixes. * Reduce memory churn in LoaderIndexJournalWriter, fix bug with keyword mappings as well * Remove remains of OldDomains * Ensure LOADER_PROCESS_OPTS gets fed to the processes * LinkdbStatusWriter won't execute batch after each added item post 100 items	2023-08-29 15:37:52 +02:00
Viktor Lofgren	39c1857c61	(heartbeat, reverse-index) Better heartbeat mocking, improved heartbeats for reverse index construction.	2023-08-29 13:07:55 +02:00
Viktor Lofgren	c57a2d0dc3	(control-service) Remove old index journal files when restoring a backup.	2023-08-29 11:58:01 +02:00
Viktor Lofgren	6525b16e1f	(minor) Improved logging and error messages	2023-08-28 19:53:55 +02:00
Viktor Lofgren	b6a92506d1	(index) Hook in missing DocIdRewriter This enables documents to be ranked properly.	2023-08-28 19:53:43 +02:00
Viktor Lofgren	3101b74580	(index) Move to a lexicon-free index design This is a system-wide change. The index used to have a lexicon, mapping words to wordIds using a large in-memory hash table. This made index-construction easier, but it also added a fairly significant RAM penalty to both the index service and the loader. The new design moves to 64 bit word identifiers calculated using the murmur hash of the keyword, and an index construction based on merging smaller indices. It also became necessary half-way through to upgrade guice as its error reporting wasn't quite compatible with JDK20.	2023-08-28 14:02:23 +02:00
Viktor Lofgren	194a6057dd	(index,control) Recoverable index backups	2023-08-25 14:57:43 +02:00
Viktor Lofgren	e710e057e2	(db) Remove EC_URL and EC_PAGE_DATA from mariadb database	2023-08-25 13:45:03 +02:00
Viktor Lofgren	28188a6e59	(control) Simplify ConvertAndLoadActor	2023-08-25 13:30:20 +02:00
Viktor Lofgren	70a5df96c8	(control) Display progress of process tasks	2023-08-25 13:05:21 +02:00
Viktor Lofgren	460998d512	(index) Move index construction to separate process. This provides a much cleaner separation of concerns, and makes it possible to get rid of a lot of the gunkier parts of the index service. It will also permit lowering the Xmx on the index service a fair bit, so we can get CompressedOOps again :D	2023-08-25 12:52:54 +02:00
Viktor Lofgren	e741301417	(search) Remove endpoint flush-search-caches It's not necessary anymore with the new linkdb.	2023-08-25 09:51:06 +02:00
Viktor Lofgren	5ed5298409	(converter) Update confusing state description SWAP_LEXICON doesn't instruct the index service to do anything. It just moves the file.	2023-08-24 18:56:49 +02:00
Viktor Lofgren	b911665691	(index) Clean up and optimize valuator	2023-08-24 18:34:06 +02:00
Viktor Lofgren	56eb83319d	(index) Clean up result domain deduplicator	2023-08-24 18:24:55 +02:00
Viktor Lofgren	1e6800565a	(system) Remove EdgeId<T> and similar objects They seemed like a good idea at the time, but in practice they're wasting resources and not really providing the clarity I had hoped.	2023-08-24 17:46:02 +02:00
Viktor Lofgren	c909120ae1	(search) Basic working integration of linkdb in search service	2023-08-24 17:24:56 +02:00
Viktor Lofgren	9894f37412	(index) Implement new URL ID coding scheme. Also refactor along the way. Really needs an additional pass, these tests are very hairy.	2023-08-24 16:44:27 +02:00
Viktor Lofgren	ebc84c22fb	Upgrade antique lombok plugin This permits tests to run on JDK20 environments.	2023-08-23 14:34:32 +00:00
Viktor Lofgren	aa0d256d6a	Upgrade code to Java 20. * Change language version * Upgrade Lombok to a JDK20 compatible version	2023-08-23 13:37:49 +00:00
Viktor Lofgren	4d75fa2908	Upgrade gradle and docker plugin to support native JDK20 environments	2023-08-23 13:30:55 +00:00
Viktor Lofgren	6f222b9800	(search) Add refresh link to explore mode. This is a QOL improvement for mobile users, who otherwise would have to scroll all the way up to refresh. Also removed the confusing "this is a random set of domains"-message when viewing adjacent websites, as it's not random.	2023-08-22 12:43:44 +02:00
Viktor Lofgren	c7f0276005	(control) Don't spin on process output printing This is the "correct" way of copying stdout and stderr to the curren't process' output.	2023-08-22 11:48:54 +02:00

1 2 3 4 5 ...

352 Commits