CatgirlIntelligenceAgency

Author	SHA1	Message	Date
Viktor Lofgren	8f0950fc44	(geoip) Fix incorrect synchronization.	2023-12-11 14:01:39 +01:00
Viktor Lofgren	30bc3f9281	(converter) Use the prefix ip: instead of geopip: for country codes This is the same as the prefix for the IP address, but I don't think that substantially matters, the as two have such different namespaces there can be no confusion.	2023-12-11 13:59:23 +01:00
Viktor Lofgren	f655ec5a5c	(*) Refactor GeoIP-related code In this commit, GeoIP-related classes are refactored and relocated to a common library as they are shared across multiple services. The crawler is refactored to enable the GeoIpBlocklist to use the new GeoIpDictionary as the base of its decisions. The converter is modified ot query this data to add a geoip:-keyword to documents to permit limiting a search to the country of the hosting server. The commit also adds due BY-SA attribution in the search engine footer for the source of the IP geolocation data.	2023-12-10 17:30:43 +01:00
Viktor Lofgren	84b4158555	(minor) Fix broken test	2023-12-10 14:39:20 +01:00
Viktor Lofgren	91dd45cf64	(search) IP and IP geolocation in site info view This commit also fixes a bug in the loader where the IP field wouldn't always populate as intended, and refactors the DomainInformationService to use significantly fewer SQL queries.	2023-12-09 20:06:55 +01:00
Viktor Lofgren	37af60254f	(search) Better recipe filter Tune the recipe filter to give better results, by using the 'popular' domains set along with excluding results with heavy tracking.	2023-12-09 20:06:55 +01:00
Viktor Lofgren	f0e736d4ea	(search) Update the search profile 'Academia' to strictly filter on academic tlds The previous version used a personalized pagerank centering on a few academic domains, but this didn't work very well and most results were not very academia-centric.	2023-12-09 20:06:55 +01:00
Viktor Lofgren	e3ebb0c5bb	(*) Rename the search filter 'RETRO' into 'POPULAR' This will make the terminology more consistent between the GUI and the code. The rankings yaml still uses 'retro' though, for to retain compatibility.	2023-12-09 20:06:54 +01:00
Viktor Lofgren	6382f779c3	(search) Revert back to using 'Popular' as the default search filter Unfiltered is a bit too ... unfiltered, and gives a bad first impression for many queries.	2023-12-09 16:34:12 +01:00
Viktor Lofgren	8ef34883a8	(search) Move site information out of the search service and into assistant. This reduces the impact of restarting the search service, as the site information takes a few minutes to load during which it's not available. It also permits exposing this information via API in the future if there is interest in this. The assistant service was also modified to do a late load of the suggestions trie, as this is a major contributor to its start-up time. Finally, some changes were made to the client library, a new get() method was added that takes a TypeToken to allow deserialization of generics such as List<Foo>, and the scheduler was also modified to use virtual threads.	2023-12-09 16:30:06 +01:00
Viktor Lofgren	5c46af0edb	(converter) Refactor EncyclopediaMarginaliaNuSideloader to use ProcessingIterator Refactored the getDocumentsStream method in EncyclopediaMarginaliaNuSideloader to use the newly extracted ProcessingIterator class that encapsulates processing a stream of results from e.g a database query in parallel and returning the computed results as an iterator. The iterator was also improved on to be more reliable, previous versions of the logic would sometimes deadlock due to false positives in hasMore().	2023-12-09 15:20:53 +01:00
Viktor Lofgren	b6511fbfe2	(converter) Add AnchorTextKeywords to EncyclopediaMarginaliaNuSideloader processing The commit updates EncyclopediaMarginaliaNuSideloader to include the AnchorTextKeywords in processing documents, aiding search result relevance. It also removes old test-related functionality and a large but fairly useless test previously used to debug a specific problem, to the detriment of the overall code quality.	2023-12-09 15:20:52 +01:00
Viktor Lofgren	eccb12b366	(control) Fix spurious state detection in control-side actors A race condition was found where precession actors would sometimes skip a step, because when invoking ExecutorRemoteActor.getState(), it would get the last 'OK' actor state from a previous run of the actor! To avoid this, the trigger method was changed from returning a boolean to the message ID, negative if an error occurred, to be passed to getState to select only messages that pertain to the present or future runs.	2023-12-09 12:50:05 +01:00
Viktor Lofgren	d0982e7ba5	(converter) Add error handling and lazy load external domain links The converter was not properly initiating the external links for each domain, causing an NPE in conversion. This needs to be loaded later since we don't know the domain we're processing until we've seen it in the crawl data. Also made some refactorings to make finding converter bugs easier, and finding the related domain less awkward from the SerializableCrawlData interface.	2023-12-09 12:33:39 +01:00
Viktor Lofgren	fc30da0d48	(converter) Add academia recognition to DomainProcessor The code now includes an additional function in the DomainProcessor class that checks if a domain is associated with academia. An academic domain is identified by the ".edu" TLD, or fits a specific regex pattern matching domains like .ac.ccTld or .edu.ccTld. If these conditions are met, the search term "special:academia" is added to the domain. The existing academia search filter uses personalized pagerank to select academia-adjacent domains, but it isn't working very well. The hope is that filtering on domain names will be more effective, and that it can supplant the ranking-based approach.	2023-12-08 20:31:34 +01:00
Viktor Lofgren	e6a1052ba7	Simplify CrawlerMain, removing the CrawlerLimiter and using a global HttpFetcher with a virtual thread pool dispatcher instead of the default.	2023-12-08 20:24:01 +01:00
Viktor Lofgren	968dce50fc	(crawler) Refactored IpInterceptingNetworkInterceptor for clarity.	2023-12-08 17:45:46 +01:00
Viktor Lofgren	3bbffd3c22	(crawler) Refactor HttpFetcher to integrate WarcRecorder Partially hook in the WarcRecorder into the crawler process. So far it's not read, but should record the crawled documents. The WarcRecorder and HttpFetcher classes were also refactored and broken apart to be easier to reason about.	2023-12-08 17:12:51 +01:00
Viktor Lofgren	072b5fcd12	Implement Warc-recording wrapper for OkHttp3 client This is a first step of using WARC as an intermediate flight recorder style step in the crawler, ultimately aimed at being able to resume crawls if the crawler is restarted. This component is currently not hooked into anything. The OkHttp3 client wrapper class 'WarcRecordingFetcherClient' was implemented for web archiving. This allows for the recording of HTTP requests and responses. New classes were introduced, 'WarcDigestBuilder', 'IpInterceptingNetworkInterceptor', and 'WarcProtocolReconstructor'. The JWarc dependency was added to the build.gradle file, and relevant unit tests were also introduced. Some HttpFetcher-adjacent structural changes were also done for better organization.	2023-12-08 13:49:16 +01:00
Viktor Lofgren	fabffa80f0	(warc) Integrate the crawler's content type parsing and charset logic into the WarcSideloader	2023-12-07 15:26:01 +01:00
Viktor Lofgren	064265b0b9	(crawler) Move content type/charset sniffing to a separate microlibrary This functionality needs to be accessed by the WarcSideloader, which is in the converter. The resultant microlibrary is tiny, but I think in this case it's justifiable.	2023-12-07 15:16:37 +01:00
Viktor Lofgren	2d5d11645d	(warc) Refactor WarcSideloaderTest to not rely on specific test files on the computer	2023-12-06 19:00:29 +01:00
Viktor Lofgren	cc813a5624	(convert) Add basic support for Warc file sideloading This update includes the integration of the jwarc library and implements support for Warc file sideloading, as a first trial integration with this library.	2023-12-06 18:43:55 +01:00
Viktor Lofgren	156c067f79	(search) Fix mobile issues with browse feature	2023-12-05 21:28:50 +01:00
Viktor Lofgren	b33b013d41	(search) Fix broken script tag Apparently it can't be called suggestions.js...?	2023-12-05 20:29:13 +01:00
Viktor Lofgren	e74e2f705f	(search) Fix broken script tag suggestions.js became something else.	2023-12-05 20:20:07 +01:00
Viktor Lofgren	2e438847fc	(search) Optimize related domains queries In the future this logic probably needs to move into a separate service, as it's still quite slow to load. But this fixes response times and DOS potential of previous version.	2023-12-05 20:12:03 +01:00
Viktor Lofgren	9301c47d93	(search) Optimize related domains queries	2023-12-05 14:42:03 +01:00
Viktor Lofgren	20ec58b07f	(search) Remove layout-breakingly long URLs from the similar domains view. They're almost all .onion URLs anyway, not really the space we're looking to peer into.	2023-12-05 13:58:15 +01:00
Viktor Lofgren	98983c1015	(search) Hopefully fix race condition that leaves the response with no Content-type header	2023-12-05 13:52:36 +01:00
Viktor Lofgren	67195592c6	(search) Hopefully fix race condition that leaves the response with no Content-type header	2023-12-05 13:48:42 +01:00
Viktor Lofgren	d1e88df71e	(search) Cleaning up the code a bit	2023-12-05 13:26:05 +01:00
Viktor Lofgren	f36cfe34ab	(search) Hackery to get a more balanced view	2023-12-04 22:50:39 +01:00
Viktor Lofgren	8a1934008c	(search) Merge similar sites results with the info view. WIP: This commit needs to be cleaned up.	2023-12-04 22:10:24 +01:00
Viktor Lofgren	b41bb9cfcf	(search) Use a Ξ for mobile button title instead of "Filters". Makes it easier to distinguish form the search button.	2023-12-03 16:33:25 +01:00
Viktor Lofgren	d58324bbef	(search) Clean up filters menu a bit, improve accessibility.	2023-12-02 18:05:30 +01:00
Viktor Lofgren	cbbd45d3e5	(search) Clean up filters menu a bit, improve accessibility.	2023-12-02 18:01:03 +01:00
Viktor Lofgren	b89633ae4b	(search) Don't render a filter button on mobile when there are no filters to be presented.	2023-12-02 17:23:45 +01:00
Viktor Lofgren	96357e9bfd	(search) Fix typeahead suggestions, as well as improve mobile and desktop UX in small ways.	2023-12-02 17:06:40 +01:00
Viktor Lofgren	d530c3096f	(search) GUI tweaks to make the new interface not fall apart on mobile/chrome	2023-12-02 17:06:40 +01:00
Viktor Lofgren	ae0c1c3f2d	(control) Adjust search result margins for better visual density	2023-12-02 17:06:40 +01:00
Viktor Lofgren	0cc2564380	(search) CSS tweaks	2023-12-02 17:06:40 +01:00
Viktor Lofgren	38d20022ad	(search) Fix script loading for mobile support	2023-12-02 17:06:40 +01:00
Viktor Lofgren	280132dad0	(search) Fix script loading for mobile support	2023-12-02 17:06:40 +01:00
Viktor Lofgren	61de4e2789	(search) Retain filter options when performing a new search from the input field	2023-12-02 17:06:40 +01:00
Viktor Lofgren	f9d3455320	(search) Reduce visual weight of search results	2023-12-02 17:06:40 +01:00
Viktor Lofgren	2ff64c3c12	(search) New toggle for reducing tracking	2023-12-02 17:06:40 +01:00
Viktor Lofgren	902f235b5b	(search) Integrate 'similar' tab in site info.	2023-12-02 17:06:40 +01:00
Viktor Lofgren	97d43a6fa2	(search) Revamp browse results with new look.	2023-12-02 17:06:40 +01:00
Viktor Lofgren	9bc65ff0ca	(search) Desaturate search result titles according to rank	2023-12-02 17:06:40 +01:00
Viktor Lofgren	6cd6a615fd	(search) Add data-filter to body as a data attribute For future shenanigans ;D	2023-12-02 17:06:40 +01:00
Viktor Lofgren	5639f0653d	(search) Rename SearchProfile.name into filterId Avoid foot-gun caused by name clash with the Enumeration method name(), which returns the Java name of the enumeration value.	2023-12-02 17:06:40 +01:00
Viktor Lofgren	251174c9a2	(search) Update front page with new look	2023-12-02 17:06:40 +01:00
Viktor Lofgren	42ea87d637	(search) Update conversion results, error page, and dictionary results with new CSS.	2023-12-02 17:06:40 +01:00
Viktor Lofgren	7c8a60b8cf	(search) Site info view is mostly done Also optimize the rendering a bit to avoid having to allocate huge string buffers, writing directly to Spark's response instead.	2023-12-02 17:06:40 +01:00
Viktor Lofgren	2f4500be5a	(search) New frontend look	2023-12-02 17:06:40 +01:00
Viktor Lofgren	fa7534a362	(search) Remove dead code	2023-12-02 17:06:40 +01:00
Viktor Lofgren	a258f0af7a	(search) Refactor search parameters to include query	2023-12-02 17:06:40 +01:00
Viktor Lofgren	01621c6344	(renderer) Make helpers configurable on a by-service basis.	2023-12-02 17:06:40 +01:00
Viktor Lofgren	c7934342a6	(control) Automatic recrawl	2023-12-02 17:06:24 +01:00
Viktor Lofgren	f5c324c06b	(minor) Fix broken test	2023-12-01 17:44:39 +01:00
Viktor Lofgren	f615cf2391	(convert) Loosen up the rules enforcement for documents that have external links.	2023-12-01 17:44:29 +01:00
Viktor Lofgren	e5d274fe1c	(docs) Improve architectural documentation	2023-11-30 21:38:57 +01:00
Viktor Lofgren	166a391eae	(docs) Improve architectural documentation for the crawler.	2023-11-30 21:30:57 +01:00
Viktor Lofgren	5fb24bb27f	(docs) Improve architectural documentation for the converter.	2023-11-30 20:43:22 +01:00
Viktor Lofgren	5a5430b383	(convert) Wiki specialization that should do a better job at removing junk keywords and providing a useful summary.	2023-11-30 20:04:46 +01:00
Viktor Lofgren	67a1e1c874	(control) GUI for triggering control-side actors	2023-11-29 15:31:14 +01:00
Viktor Lofgren	4155fbe94c	(control) Reprocess-all actor	2023-11-28 17:58:48 +01:00
Viktor Lofgren	347fe6b7be	(control) Reindex-all actor	2023-11-28 16:41:09 +01:00
Viktor Lofgren	ff3ceb981e	(control) Button for removing a stale 'NEW' status If a process is violently terminated, the associated file storage may get stuck in the ephemeral 'NEW' state, preventing future operations on the associated data. To remedy this without having to dig through the database, a button was added to reset the state. It's a band-aid, but the situation is rare enough that I think it's fine.	2023-11-28 15:18:24 +01:00
Viktor Lofgren	1dafa0c74d	(mqapi/control) Repair repartition endpoint, deprecate notify endpoints. The repartition endpoint was mis-addressing its mqapi notifications, omitting the proper nodeId. In fixing this, it became apparent that having both @MqRequest and @MqNotification is a serious footgun, and the two should be unified into a single API where the caller isn't burdened with knowledge of the remote end's implementation specifics.	2023-11-27 16:01:12 +01:00
Viktor Lofgren	09917837d0	(process) Ensure construction exceptions are logged Wrapping these exceptions in a try-catch and logging them with slf4j will ensure they end up in the process logs. The way it worked using the default exception handler, they'd print on console (which nothing captures!), leading to a very annoying debugging experience.	2023-11-22 18:32:06 +01:00
Viktor Lofgren	dd507a3808	(db) Fix migrations, bump flyway to 10.0.1 Tricky problem, creating a procedure apparently needs delimiter shenanigans in Flyway, otherwise it will truncate the END statement and mariadb will be sad.	2023-11-21 20:04:35 +01:00
Viktor Lofgren	dd9406d0ac	(control) Make storage type tabs consistent This had fallen off in the Create New Specification view, it lacked Exports.	2023-11-17 11:26:45 +01:00
Viktor Lofgren	f58a9f46be	(loader) Don't truncate the entire links table on load This behavior is an old vestige from the days of only having a single loader process. We'd truncate the links table because doing inserts/updates was too slow. This was also important because we had 32 bit ID, and there's a lot of links between domains to go around... Instead we delete the rows associated with the current node with a stored procedure PURGE_LINKS_TABLE. We also update the PRIMARY KEY to a BIGINT. We'll need to load the data in excess of billion times to hit an ID rollover, so it'll be fine.	2023-11-16 10:30:12 +01:00
Viktor Lofgren	1cbf23e7e7	(test) Don't fail test if atags.parquet is not in ~vlofgren	2023-11-15 09:11:38 +01:00
Viktor Lofgren	63554ba171	(explore2) Add robots.txt	2023-11-14 09:15:32 +01:00
Viktor Lofgren	5de37cb820	(converter) Set feature flags appropriately on stackexchange posts	2023-11-12 15:48:08 +01:00
Viktor Lofgren	e5cee1f46d	(sideload) Fix sideloading so that it doesn't get disproportionately good rankings Also add type flags so that e.g. wikipedia shows up in the wikis filter.	2023-11-12 14:57:57 +01:00
Viktor Lofgren	e9a01caa5c	(index) Fix broken metrics	2023-11-11 12:53:47 +01:00
Viktor Lofgren	858357a246	(metrics) Get prometheus up out of disrepair * Fix bad labels * Add nodeId where appropriate * Hopefully fix histogram buckets for index query times	2023-11-08 14:01:28 +01:00
Viktor Lofgren	7aa2f80117	(domain) id.au should be treated as a TLD	2023-11-06 19:07:47 +01:00
Viktor Lofgren	7617b4cbc2	(crawler) Fix NPE in crawler caused by not having fetched the domains list yet	2023-11-06 18:16:38 +01:00
Viktor Lofgren	e0c769fd19	(converter) Integrate atags.parquet with the encyclopedia sideloader Also clean up stackexchange and dirtree a bit.	2023-11-06 18:03:01 +01:00
Viktor Lofgren	ebd10a5f28	(crawler) Integrate atags.parquet with the crawler so that "important" URLs are prioritized	2023-11-06 16:14:58 +01:00
Viktor Lofgren	2b77184281	(converter) Integrate atags with the topology field	2023-11-06 13:46:44 +01:00
Viktor Lofgren	e23976f6c4	(search) Fix card title overflow	2023-11-06 13:25:39 +01:00
Viktor Lofgren	0b8dc02eba	(result-ranking) Nudge up results with ngram matches a tiny bit	2023-11-06 13:14:22 +01:00
Viktor Lofgren	fde1d0677e	(search) Remove unnecessary dependencies	2023-11-06 12:56:32 +01:00
Viktor Lofgren	48986574ae	(result-ranking) Use a weighted calculation of priority term importance	2023-11-06 12:56:21 +01:00
Viktor Lofgren	c7a6a71d07	(result-ranking) Use a weighted calculation of priority term importance	2023-11-06 12:48:23 +01:00
Viktor Lofgren	1847845151	Revert "(loader) Optimize INSERT statements" This reverts commit `7cb92195d1`.	2023-11-04 19:32:02 +01:00
Viktor Lofgren	7cb92195d1	(loader) Optimize INSERT statements INSERT IGNORE is too slow.	2023-11-04 17:43:55 +01:00
Viktor Lofgren	72afa0341f	duckdb connection may need to be synchronized?	2023-11-04 14:30:25 +01:00
Viktor Lofgren	0152004c42	Initial Commit Anchor Tags * Added new (optional) model file in $WMSA_HOME/data/atags.parquet * Converter gets a component for creating a projection of its domains onto the full atags parquet file * New WordFlag ExternalLink * These terms are also for now flagged as title words * Fixed a bug where Title words aliased with UrlDomain words * Fixed a bug in the encyclopedia sideloader that gave everything too high topology ranking	2023-11-04 14:24:17 +01:00
Viktor Lofgren	8e9698c9a0	(control/search) Add ability to suggest removing a site from random exploration This is what most complaints have been about.	2023-11-02 15:29:49 +01:00
Viktor Lofgren	3047e2dd7c	(screenshot-capture-tool) Make screenshot-capture-tool cooperate with docker	2023-11-01 16:38:55 +01:00
Viktor Lofgren	a8b9d21f2d	(executor) Refine atag export logic * Remove obviously uninteresting tags * Omit URL schema for more sensible sorting * Change the column order to put the source domain last	2023-11-01 13:23:14 +01:00
Viktor Lofgren	c77a5b7cb6	(control) GUI for atags export	2023-10-31 17:55:47 +01:00
Viktor Lofgren	23f2068e33	(executor) Actor for exporting anchor tag data from crawl data	2023-10-31 17:32:34 +01:00
Viktor Lofgren	ffadfb4149	(control) Use a partial template for the storage types tabs.	2023-10-31 17:12:14 +01:00
Viktor Lofgren	b7e38cfbae	(control) Add exports view	2023-10-31 17:08:48 +01:00
Viktor Lofgren	659743b39c	(executor) Export Data actor allocates its own storage	2023-10-31 17:04:07 +01:00
Viktor Lofgren	69758c5859	(control) Nicer redirects acknowledging actions	2023-10-31 16:26:29 +01:00
Viktor Lofgren	81bfd7e5fb	(experiment) Utility for exporting atags	2023-10-31 16:10:21 +01:00
Viktor Lofgren	8f74dbdbb4	(crawler) Set more lenient parameters for recrawl	2023-10-30 11:35:30 +01:00
Viktor Lofgren	fd5a7eac87	(crawler) Exit crawler retriever on thread interrupted	2023-10-30 11:34:16 +01:00
Viktor Lofgren	6bac3c75cb	(api) API documentation	2023-10-29 16:13:21 +01:00
Viktor Lofgren	5d6e0e3790	(log) Clean up logging Don't log the PROCESS stream to executor's logs, as it will also be logged in the spawned process' log files. Also tell the spawned process which "service" it is so that it gets a log file with a name that makes sense.	2023-10-29 15:52:17 +01:00
Viktor Lofgren	2871a326e6	(ctrl/exe) Clean up UX and code	2023-10-29 14:09:39 +01:00
Viktor Lofgren	abb42f0f36	(crawler) Fix bug in SQL statement Arguments were in the wrong order in inserting fetching sites submitted to be crawled	2023-10-29 13:19:17 +01:00
Viktor Lofgren	f6fcb04817	(experiment) Repair the experiment runner	2023-10-27 16:16:50 +02:00
Viktor Lofgren	88f49834fd	(docs) Update documentation	2023-10-27 12:45:39 +02:00
Viktor Lofgren	4415f52e18	(keyword-extraction) Fix broken test	2023-10-27 12:19:33 +02:00
Viktor Lofgren	98d742d634	(actor) Code cleanup	2023-10-27 12:19:20 +02:00
Viktor Lofgren	6c1ca10be7	(minor) code cleanup	2023-10-27 11:38:37 +02:00
Viktor Lofgren	aeaf2d546a	(search) Fix broken redirect for flagging problems with websites	2023-10-27 11:20:49 +02:00
Viktor Lofgren	c7cb6664b4	(control) Indicate missing services with danger-color instead of having a distracting and constantly updating last-seen number	2023-10-26 18:05:22 +02:00
Viktor Lofgren	79adba9284	(index) Fix bug in dealing with quoted search terms	2023-10-26 16:28:23 +02:00
Viktor Lofgren	37b7f52f2c	(minor) Reduce log severity for getTermMeta miss	2023-10-26 15:41:52 +02:00
Viktor Lofgren	c89e0ab255	(minor) Disable ~vlofgren specific debug test	2023-10-26 15:27:59 +02:00
Viktor Lofgren	f613f4f2df	(array) Fix spurious search results This was caused by a bug in the binary search algorithm causing it to sometimes return positive values when encoding a search miss. It was also necessary to get rid of the vestiges of the old LongArray and IntArray classes to make this fix doable.	2023-10-26 15:27:02 +02:00
Viktor Lofgren	a497e4c920	(crawler) Terminate crawler after a few hours of no progress	2023-10-26 12:49:28 +02:00
Viktor Lofgren	0f637fb722	(logging) Better logging configurations	2023-10-26 12:48:10 +02:00
Viktor Lofgren	abbadc92a0	(exdecutor) Prevent TriggerAdjacencyCalculationActor from showing up in the actions tab when it isn't running	2023-10-25 21:25:07 +02:00
Viktor Lofgren	97fcbdd6d9	(control) Move storage actions into the actions tab * Also disable annoying CSS animations	2023-10-25 21:23:56 +02:00
Viktor Lofgren	d7686b665e	Refactoring * Encyclopedia sideloader; permit providing base URL. * Storage base shows node id in GUI * ProcessLivenessMonitorActor restarts automatically * Clean-up of outbox code	2023-10-25 18:51:02 +02:00
Viktor Lofgren	5de41a3a7f	(search-service) Show node affinity in site info tab	2023-10-25 12:44:48 +02:00
Viktor Lofgren	84cdac83d6	(control) Move message queue monitor to control	2023-10-24 16:44:28 +02:00
Viktor Lofgren	436a55ee1e	(control) Render UUID tooltip with dashes.	2023-10-24 16:37:40 +02:00
Viktor Lofgren	313cc2965c	(index-creation) Print whether full or prio is created Previous state of saying reverse index for both was pretty confusing.	2023-10-24 16:23:10 +02:00
Viktor Lofgren	95f74c5ea7	(control) Filter out heartbeats that are stopped	2023-10-24 16:09:28 +02:00
Viktor Lofgren	8d1c3c754d	Testing development flow with adding a ~tilde search filter	2023-10-24 15:35:15 +02:00
Viktor Lofgren	72152f9d80	Fix bug in handling js parameters	2023-10-24 15:10:02 +02:00
Viktor Lofgren	ebd365a128	Fix exception	2023-10-24 15:04:12 +02:00
Viktor Lofgren	0406e76889	(api) Remove logging cruft	2023-10-24 13:39:05 +02:00
Viktor Lofgren	c2b28c0f8d	(api) Trial streaming API	2023-10-24 13:26:46 +02:00
Viktor Lofgren	9aa5038756	(search) Remove unnecessary filtering operation	2023-10-24 11:43:47 +02:00
Viktor Lofgren	a860f8f1a8	(index/qs) GRPC API for better query peformance	2023-10-24 11:38:07 +02:00
Viktor Lofgren	487c016a32	(qs) Speed	2023-10-23 14:03:09 +02:00
Viktor Lofgren	e4bddb4993	(control) Better UUID accessibility	2023-10-23 12:53:53 +02:00
Viktor Lofgren	731afcb864	(qs) Parallel execution	2023-10-23 12:06:03 +02:00
Viktor Lofgren	efb73ff4e7	(qs) Don't blow up if an index node isn't responsive	2023-10-23 11:53:18 +02:00
Viktor Lofgren	2ed2f35a9b	(actor) Rewrite of the actor prototype class using record pattern matching	2023-10-23 10:18:20 +02:00
Viktor Lofgren	119151cad3	(converter) Separtion of concerns	2023-10-22 14:35:33 +02:00
Viktor Lofgren	758f9b5aa5	(converter) Get UUID pips out of the models Rendering concerns shouldn't be in the models, it's poor separation of concerns and very difficult to follow.	2023-10-22 14:24:52 +02:00
Viktor Lofgren	e06a8c1de2	(converter) Put upper limit on number of worker threads.	2023-10-22 14:03:09 +02:00
Viktor Lofgren	29ce8ca0cf	(db) Reduce db pool size This is a temporary thing	2023-10-22 14:03:09 +02:00
Viktor Lofgren	eb4158df0b	(control) Fix start/stop FSM endpoints	2023-10-22 14:03:09 +02:00
Viktor Lofgren	12fda1a36b	(control) Temporarily re-writing the data balancer to get it to work in prod Need to clean this up later.	2023-10-22 14:03:09 +02:00
Viktor Lofgren	e927f99777	(control) JSON serializes Map<Integer> to Map<Double> and Java gets confused	2023-10-21 16:24:20 +02:00
Viktor Lofgren	044bcf55bd	(control) Fix SQL in rebalance actor	2023-10-21 16:13:37 +02:00
Viktor Lofgren	e475af9f49	(control) Initialize controlActorService	2023-10-21 16:06:53 +02:00
Viktor Lofgren	c6abcd91fa	(control) Better use of FS states, fix bug with start/stop actors	2023-10-20 16:37:49 +02:00
Viktor Lofgren	10fc489822	(converter) More robust filename resolution	2023-10-20 14:16:03 +02:00
Viktor Lofgren	d76d926c38	(control/executor) Add new configuration options for node It's now possible to configure prod instance to not retain processed data.	2023-10-20 14:05:19 +02:00
Viktor Lofgren	2b3c167845	(controller) Additional configuration options for node	2023-10-20 13:13:36 +02:00
Viktor Lofgren	1d75b974b5	(loader bugfix) Set DOMAIN_METADATA appropriately	2023-10-20 13:03:27 +02:00
Viktor Lofgren	584bb3a648	(fs) interface cleanup	2023-10-20 12:24:18 +02:00
Viktor Lofgren	7b5ec6b98f	(executor-service) Embed dist/ in executor-service's docker image	2023-10-19 17:48:34 +02:00
Viktor Lofgren	23526f6d1a	(executor) Executor service now pulls DomainType list for CRAWL on "recrawl" This is an automatic integration with the submit-site repo on github and also crawl-queue.	2023-10-19 17:48:34 +02:00
Viktor Lofgren	809b3ee023	(control) Update GUI for crawl specs. They are now less important than they were before.	2023-10-19 17:48:34 +02:00
Viktor Lofgren	23f0c79fba	(control) GUI for data sets/domain types.	2023-10-19 17:48:34 +02:00
Viktor Lofgren	81dd3809e9	(*) WIP Add node affinity to EC_DOMAIN Very messy commit due to fractalline yak shaving	2023-10-19 17:48:34 +02:00
Viktor Lofgren	2bf0c4497d	(*) Tool for unfcking old crawl data so that it aligns with the new style IDs	2023-10-19 17:48:34 +02:00
Viktor Lofgren	978550f809	(executor-service) Retire features-convert and move the corresponding packages into the executor service.	2023-10-16 15:43:46 +02:00
Viktor Lofgren	84fea0fd05	(node) Nodes auto-start their monitor actors.	2023-10-16 15:33:22 +02:00
Viktor Lofgren	2df3e0f881	(node) Nodes auto-configure on start-up instead of requiring manual configuration.	2023-10-16 14:46:35 +02:00
Viktor Lofgren	c98117f69d	(actor) FS monitor should pick up stuff in BACKUP as well.	2023-10-16 14:37:36 +02:00
Viktor Lofgren	ede5d1f890	(actor) Give process spawners more easily recognizable names.	2023-10-16 14:19:00 +02:00
Viktor Lofgren	39911e3acd	(control) Fix incorrect storage base and clean up GUI for data	2023-10-16 13:30:26 +02:00
Viktor Lofgren	3d1c15ef99	(client) Refactor liveness monitor	2023-10-16 12:34:01 +02:00
Viktor Lofgren	f718482e98	(client) Fix tests	2023-10-16 12:12:16 +02:00
Viktor Lofgren	8dafd13cd7	(client) Fix executor tests	2023-10-16 12:02:57 +02:00
Viktor Lofgren	0b19b28a64	(file-storage) Delete unused code	2023-10-16 12:02:57 +02:00
Viktor Lofgren	c245f7ce3a	(control) Bootstrapify review-domains and search-to-ban views.	2023-10-15 22:04:23 +02:00
Viktor Lofgren	607d647483	(control) Remove services listing view	2023-10-15 21:48:55 +02:00
Viktor Lofgren	9a38a455c9	(control/exec) File listings in control GUI	2023-10-15 19:15:44 +02:00
Viktor Lofgren	16e0738731	(*) Get multi-node routing working.	2023-10-15 18:38:30 +02:00
Viktor Lofgren	eacbf87979	(control) New list and form for index nodes.	2023-10-14 21:46:52 +02:00
Viktor Lofgren	108b4cb648	(service) Keep disabled multi-noded services dormant when they are configured to be disabled.	2023-10-14 20:58:55 +02:00
Viktor Lofgren	a9dff407a1	(config/db) Clean up migrations	2023-10-14 20:34:03 +02:00
Viktor Lofgren	9e26109e36	(reverse-index) Don't always POST	2023-10-14 16:48:29 +02:00
Viktor Lofgren	6308a8dfcd	(control) Node configuration	2023-10-14 16:47:52 +02:00
Viktor Lofgren	4baf9527d7	() WIP Control GUI redesign, executor-service, multi-node mq This turned out to be very difficult to do in small isolated steps. Design overhaul of the control gui using bootstrap * Move the actors out of control-service into to a new executor-service, that can be run on multiple nodes * Add node-affinity to message queue	2023-10-14 12:08:43 +02:00
Viktor Lofgren	199c459697	(*) Add node-affinity to services, processes and file storage.	2023-10-10 12:32:22 +02:00
Viktor Lofgren	61288c5e68	(service, client) First steps towards multiple nodedness	2023-10-09 22:13:27 +02:00
Viktor Lofgren	8375237de5	(converter) Add special keyword for websites with a tilde url.	2023-10-09 17:02:32 +02:00
Viktor Lofgren	6319b8ef51	(api-service) Improved testability, always set content type to application/json	2023-10-09 15:39:34 +02:00
Viktor Lofgren	397a85eaa4	(query-service) Apply blacklisting to search results	2023-10-09 15:18:53 +02:00
Viktor Lofgren	3889c4bdd9	(refactor) Remove features-search and update documentation	2023-10-09 15:12:30 +02:00
Viktor Lofgren	c899f1cb85	(docs) Update documentation to reflect new query service	2023-10-09 14:56:59 +02:00
Viktor Lofgren	d8956c51d0	(refactor) Remove api:search-api Application services should not have an API, but purely act as clients to the core services (which should always have an API).	2023-10-09 14:42:33 +02:00
Viktor Lofgren	5dd55c7cad	(refactor) Rename satellite services to application services This is a better descriptor, since they now all implement different applications on top of the core services' APIs.	2023-10-09 13:45:45 +02:00
Viktor Lofgren	c0e61d4c87	(refactor) Move search service into services-satellite	2023-10-09 13:40:01 +02:00
Viktor Lofgren	97e17282ab	(query-service) Move query parsing from search-service to the new query service.	2023-10-09 13:27:44 +02:00
Viktor Lofgren	94c882af7d	(query-service) Provide delegate of IndexApi's query functionality. This is an intermediate step in the process of introducing the query-service as a proxy between search and index.	2023-10-08 22:22:26 +02:00
Viktor Lofgren	89c6d85f2f	(query-service) Create new empty 'query-service' service	2023-10-08 17:31:50 +02:00
Viktor Lofgren	cf366c602f	(search) Refactor SearchQueryIndexService in preparation for feature extraction. Prefer working on DecoratedSearchResultItem in favor of UrlDetails.	2023-10-08 17:15:41 +02:00
Viktor Lofgren	77ccab7d80	(index) Move linkdb to index from search. This makes index complete in the sense that you can deploy an index instance and build a complete separate application on top of it, without having to go through the Marginalia-laden search service.	2023-10-08 16:48:35 +02:00

... 2 3 4 5 6 ...

846 Commits