CatgirlIntelligenceAgency

Author	SHA1	Message	Date
Viktor Lofgren	144f967dbf	(misc) Tweak pool sizes	2024-02-28 16:23:02 +01:00
Viktor Lofgren	b31c9bb726	(docs) Update process docs	2024-02-28 15:21:33 +01:00
Viktor Lofgren	c0820b5e5c	(docs) Update service docs	2024-02-28 15:19:31 +01:00
Viktor Lofgren	65b8a1d5d9	(grpc) Reduce error spam	2024-02-28 14:44:48 +01:00
Viktor Lofgren	a0648844fb	(grpc) Reduce error spam	2024-02-28 14:35:29 +01:00
Viktor Lofgren	c4a27003c6	(docs) Fix formatting	2024-02-28 14:22:57 +01:00
Viktor Lofgren	86bbc1043e	(service) Clean up thread pool creation	2024-02-28 14:06:32 +01:00
Viktor Lofgren	a8ec59eb75	(conf) Add migration warning when ZOOKEEPER_HOSTS is not set.	2024-02-28 12:09:38 +01:00
Viktor Lofgren	9f1649636e	Clean up documentation and rename `domain-links` to `link-graph`	2024-02-28 11:40:39 +01:00
Viktor Lofgren	3a65fe8917	Add offload executor to GrpcChannelPoolFactory	2024-02-27 22:08:39 +01:00
Viktor Lofgren	e696fd9e92	(docs) Begin un-fucking the docs after refactoring	2024-02-27 21:22:21 +01:00
Viktor Lofgren	eaf836dc66	(service/grpc) Reduce thread count Netty and GRPC by default spawns an incredible number of threads on high-core CPUs, which amount to a fair bit of RAM usage. Add custom executors that throttle this behavior.	2024-02-27 21:22:21 +01:00
Viktor Lofgren	dbf64b0987	(logs) Add the option for json logging	2024-02-27 21:22:20 +01:00
Viktor Lofgren	ff0ef1eebc	(cleanup) Minor cleanups	2024-02-24 15:33:56 +01:00
Viktor Lofgren	1d34224416	(refac) Remove src/main from all source code paths. Look, this will make the git history look funny, but trimming unnecessary depth from the source tree is a very necessary sanity-preserving measure when dealing with a super-modularized codebase like this one. While it makes the project configuration a bit less conventional, it will save you several clicks every time you jump between modules. Which you'll do a lot, because it's modular. The src/main/java convention makes a lot of sense for a non-modular project though. This ain't that.	2024-02-23 16:13:40 +01:00
Viktor Lofgren	2201b1a506	(refac) Clean up code issues	2024-02-23 11:39:19 +01:00
Viktor Lofgren	5cdb07023b	(refac) Clean up unused imports	2024-02-23 11:27:20 +01:00
Viktor Lofgren	6357d30ea0	Clean up docs	2024-02-22 19:53:20 +01:00
Viktor Lofgren	8d4ef982d0	Clean up docs	2024-02-22 19:37:59 +01:00
Viktor Lofgren	4740156cfa	Clean up docs	2024-02-22 18:18:58 +01:00
Viktor Lofgren	085137ca63	* Extract the index functionality	2024-02-22 17:31:25 +01:00
Viktor Lofgren	66c1281301	(zk-registry) epic jak shaving WIP Cleaning out a lot of old junk from the code, and one thing lead to another... * Build is improved, now constructing docker images with 'jib'. Clean build went from 3 minutes to 50 seconds. * The ProcessService's spawning is smarter. Will now just spawn a java process instead of relying on the application plugin's generated outputs. * Project is migrated to GraalVM * gRPC clients are re-written with a neat fluent/functional style. e.g. ```channelPool.call(grpcStub::method) .async(executor) // <-- optional .run(argument); ``` This change is primarily to allow handling ManagedChannel errors, but it turned out to be a pretty clean API overall. * For now the project is all in on zookeeper * Service discovery is now based on APIs and not services. Theoretically means we could ship the same code either a monolith or a service mesh. * To this end, began modularizing a few of the APIs so that they aren't strongly "living" in a service. WIP! Missing is documentation and testing, and some more breaking apart of code.	2024-02-22 14:01:23 +01:00
Viktor Lofgren	73947d9eca	(zk-registry) Filter out phantom addresses in the registry The change adds a hostname validation step to remove endpoints from the ZkServiceRegistry when they do not resolve. This is a scenario that primarily happens when running in docker, and the entire system is started and stopped.	2024-02-20 18:09:11 +01:00
Viktor Lofgren	a69c0b2718	(grpc-client) Fix warmup crash The warmup would sometimes crash during a cold start-up, because it could not get an API. Changed the warmup to just create a GrpcSingleNodeChannelPool for the node.	2024-02-20 18:03:57 +01:00
Viktor Lofgren	6c764bceeb	(doc) Update documentation for `service-discovery`	2024-02-20 16:09:49 +01:00
Viktor Lofgren	273aeb7bae	(doc) Update documentation with new gRPC service setup	2024-02-20 16:06:05 +01:00
Viktor Lofgren	d185858266	(minor) Add missing query parameter to ServiceEndpoint.toURL	2024-02-20 15:49:43 +01:00
Viktor Lofgren	453bd6064b	(minor) Add warm-up to GrpcMultiNodeChannelPool to speed up the initial messages Without doing this, connections would be created lazily, which is probably never desirable.	2024-02-20 15:45:16 +01:00
Viktor Lofgren	ee8e0497ae	(refac) Move service discovery injection to a separate guice module	2024-02-20 15:41:04 +01:00
Viktor Lofgren	30bdb4b4e9	(config) Clean up service configuration for IP addresses Adds new ways to configure the bind and external IP addresses for a service. Notably, if the environment variable WMSA_IN_DOCKER is present, the system will grab the HOSTNAME variable and announce that as the external address in the service registry. The default bind address is also changed to be 0.0.0.0 only if WMSA_IN_DOCKER is present, otherwise 127.0.0.1; as this is a more secure default.	2024-02-20 14:22:48 +01:00
Viktor Lofgren	2ee492fb74	(gRPC) Bind gRPC services to an interface By default gRPC it magically decides on an interface. The change will explicitly tell it what to use.	2024-02-20 14:22:47 +01:00
Viktor Lofgren	36a5c8b44c	(cleanup) Clean up code	2024-02-20 14:22:47 +01:00
Viktor Lofgren	07b625c58d	(query-client) Add support for fault-tolerant requests to single node services Adding a method importantCall that will retry a failing request on each route until it succeeds or the routes run out.	2024-02-20 14:16:05 +01:00
Viktor Lofgren	746a865106	(client) Fix handling of channel refreshes The previous code made an incorrect assumption that all routes refer to the same node, and would overwrite the route list on each update. This lead to storms of closing and opening channels whenever an update was received. The new code is correctly aware that we may talk to multiple nodes.	2024-02-20 14:14:09 +01:00
Viktor	f85ec28a16	Merge branch 'master' into service-discovery	2024-02-20 11:44:12 +01:00
Viktor Lofgren	0307c55f9f	(refac) Zookeeper for service-discovery, kill service-client lib (WIP) To avoid having to either hard-code or manually configure service addresses (possibly several dozen), and to reduce the project's dependency on docker to deal with routing and discovery, the option to use [Zookeeper](https://zookeeper.apache.org/) to manage services and discovery has been added. A service registry interface was added, with a Zookeeper implementation and a basic implementation that only works on docker and hard-codes everything. The last remaining REST service, the assistant-service, has been migrated to gRPC. This also proved a good time to clear out primordial technical debt from the root of the codebase. The 'service-client' library has been taken behind the barn and given a last farewell. It's replaced by a small library for managing gRPC channels. Since it's no longer used by anything, RxJava has been removed as a dependency from the project. Although the current state seems reasonably stable, this is a work-in-progress commit.	2024-02-20 11:41:14 +01:00
Viktor	d05c916491	Merge pull request #80 from MarginaliaSearch/ranking-algorithms Clean up domain ranking code	2024-02-18 09:52:34 +01:00
Viktor Lofgren	e61e7f44b9	(blacklist) Delay startup of blacklist To help services start faster, the blacklist will no longer block until it's loaded. If such a behavior is desirable, a method was added to explicitly wait for the data.	2024-02-18 09:23:20 +01:00
Viktor Lofgren	f9b6ac03c6	(api) Clean up incorrect error handling in GrpcChannelPool	2024-02-18 08:45:35 +01:00
Viktor Lofgren	296ccc5f8e	(blacklist) Clean up blacklist impl The domain blacklist blocked the start-up of each process that injected it, adding like 30 seconds to the start-up time in prod. This change moves the loading to a separate thread entirely. For threads or processes that require the blacklist to be definitely loaded, a helper method was added that blocks until that time.	2024-02-18 08:16:48 +01:00
Viktor Lofgren	92717a4832	(client) Refactor GrpcStubPool to handle error states Refactored the GRPC Stub Pool for better handling of channel SHUTDOWN state. Any disconnected channels are now re-created before returning the stub. The class was also renamed to GrpcChannelPool, as we no longer pool the stubs.	2024-02-17 14:42:26 +01:00
Viktor Lofgren	9ec262ae00	(domain-ranking) Integrate new ranking logic The change deprecates the 'algorithm' field from the domain ranking set configuration. Instead, the algorithm will be chosen based on whether influence domains are provided, and whether similarity data is present.	2024-02-16 20:22:01 +01:00
Viktor Lofgren	b15f47d80e	(db) Retire the EC_DOMAIN_LINK table Retire the EC_DOMAIN_LINK table as the data has been migrated off into a file instead.	2024-02-08 15:52:30 +01:00
Viktor	e8de468b0b	Make executor API talk GRPC (#75 ) * (executor-api) Make executor API talk GRPC The executor's REST API was very fragile and annoying to work with, lacking even basic type safety. Migrate to use GRPC instead. GRPC is a bit of a pain with how verbose it is, but that is probably a lesser evil. This is a fairly straightforward change, but it's also large so a solid round of testing is needed... The change set breaks out the GrpcStubPool previously residing in the QueryService, and makes it available to all clients. ServiceId.name was also renamed to avoid the very dangerous clash with Enum.name(). The boilerplate needed for grpc was also extracted into a common gradle file for inclusion into the appropriate build.gradle-files.	2024-02-08 13:01:12 +01:00
Viktor Lofgren	467ba5be20	(index-construction) Split repartition into two actions This change splits the previous 'repartition' action into two steps, one for recalculating the domain rankings, and one for recalculating the other ranking sets. Since only the first is necessary before the index construction, the rest can be delayed until after... To avoid issues in handling the shotgun blast of MqNotifications, Service was switched over to use a synchronous message queue instead of an asynchronous one. The change also modifies the behavior so that only node 1 will push the changes to the EC_DOMAIN database table, to avoid unnecessary db locks and contention with the loader. Additionally, the change fixes a bug where the index construction code wasn't actually picking up the rankings data. Since the index construction used to be performed by the index-service, merely saving the data to memory was enough for it to be accessible within the index-construction logic, but since it's been broken out into a separate process, the new process just injected an empty DomainRankings object instead. To fix this, DomainRankings can now be persisted to disk, and a pre-loaded version of the object is injected into the index-construction process.	2024-02-06 17:20:07 +01:00
Viktor Lofgren	98f3382cea	(minor) Fix test and improve error message	2024-01-31 11:53:41 +01:00
Viktor Lofgren	cae1bad274	(*) Add download-sample action, refactor file storage This changeset adds an action for downloading a set of sample data from downloads.marginalia.nu. It also refactors out some leaky abstractions out of FileStorageService. allocateTemporaryStorage has been renamed allocateStorage. The storage was never temporary in any scenario... It also doesn't take a storage base, as there was always only one valid option for this input. The allocateStorage method finds the appropriate base itself.	2024-01-25 13:36:30 +01:00
Viktor Lofgren	c088c25b09	(*) Fix broken test, clean up code	2024-01-24 12:50:41 +01:00
Viktor Lofgren	958d64720e	(control) Add a view for restarting aborted processes This will avoid having to dig in the message queue to perform this relatively common task. The control service was also refactored to extract common timestamp formatting logic out of the data objects and into the rendering.	2024-01-24 12:47:10 +01:00
Viktor Lofgren	805afad4fe	(control) New GUI for exporting crawl data samples Not going to win any beauty pageants, but this is pretty peripheral functionality.	2024-01-23 17:08:21 +01:00

1 2 3 4 5 ...

270 Commits