A fork of MarginaliaSearch for Catgirl Intelligence Agency
Go to file
Viktor e8de468b0b
Make executor API talk GRPC (#75)
* (executor-api) Make executor API talk GRPC

The executor's REST API was very fragile and annoying to work with, lacking even basic type safety.  Migrate to use GRPC instead.  GRPC is a bit of a pain with how verbose it is, but that is probably a lesser evil.  This is a fairly straightforward change, but it's also large so a solid round of testing is needed...

The change set breaks out the GrpcStubPool previously residing in the QueryService, and makes it available to all clients.

ServiceId.name was also renamed to avoid the very dangerous clash with Enum.name().

The boilerplate needed for grpc was also extracted into a common gradle file for inclusion into the appropriate build.gradle-files.
2024-02-08 13:01:12 +01:00
.github Update FUNDING.yml 2023-07-04 18:46:58 +02:00
code Make executor API talk GRPC (#75) 2024-02-08 13:01:12 +01:00
doc (doc) Add ide quick-start guide 2024-01-24 14:39:33 +01:00
gradle/wrapper (gradle) Bump gradle-wrapper version to 8.5 2023-12-13 15:35:01 +01:00
run Fix typo in install.sh 2024-01-25 17:08:18 +01:00
third-party (deps) Remove monkey patched GSON 2024-02-06 12:11:39 +01:00
tools Refactor website screenshot tool and website adjacencies calculator into code/tools. 2023-04-11 16:20:27 +02:00
.gitignore Restructuring the git repo 2023-03-04 13:19:01 +01:00
Additional Contributors.md Update Additional Contributors.md 2023-12-19 12:22:01 +01:00
build.gradle (*) install script for deploying Marginalia outside the codebase 2024-01-11 12:40:03 +01:00
CONTRIBUTING.md (doc) Add ide quick-start guide 2024-01-24 14:36:44 +01:00
docker-compose-screenshot-bot.yml (docs) Document barebones config 2024-01-11 09:43:08 +01:00
docker-service-with-dist.gradle (*) install script for deploying Marginalia outside the codebase 2024-01-11 12:40:03 +01:00
docker-service.gradle (*) install script for deploying Marginalia outside the codebase 2024-01-11 12:40:03 +01:00
gradle.properties Restructuring the git repo 2023-03-04 13:19:01 +01:00
gradlew first commit 2022-05-19 17:45:26 +02:00
gradlew.bat Merge changes from experimental branch (#132) 2023-01-08 11:11:44 +01:00
LICENSE.md Update LICENSE.md 2023-03-20 16:49:07 +01:00
NGI0Entrust_tag.svg Update README to external reflect funding. 2023-06-27 18:20:55 +02:00
nlnet.png Update README to external reflect funding. 2023-06-27 18:20:55 +02:00
protobuf.gradle Make executor API talk GRPC (#75) 2024-02-08 13:01:12 +01:00
README.md (*) Update the readme with a link to the demo video 2024-01-26 13:49:41 +01:00
settings.gradle (*) Remove dead code 2024-02-06 12:41:13 +01:00

Marginalia Search

This is the source code for Marginalia Search.

The aim of the project is to develop new and alternative discovery methods for the Internet. It's an experimental workshop as much as it is a public service, the overarching goal is to elevate the more human, non-commercial sides of the Internet.

A side-goal is to do this without requiring datacenters and enterprise hardware budgets, to be able to run this operation on affordable hardware with minimal operational overhead.

The long term plan is to refine the search engine so that it provide enough public value that the project can be funded through grants, donations and commercial API licenses (non-commercial share-alike is always free).

The system can both be run as a copy of Marginalia Search, or as a white-label search engine for your own data (either crawled or side-loaded). At present the logic isn't very configurable, and a lot of the judgements made are based on the Marginalia project's goals, but additional configurability is being worked on!

Here's a demo of the set-up and operation of the self-hostable barebones mode of the search engine: 🌎 https://www.youtube.com/watch?v=PNwMkenQQ24

Set up

To set up a local test environment, follow the instructions in 📄 run/readme.md!

Further documentation is available at 🌎 https://docs.marginalia.nu/.

Before compiling, it's necessary to run ⚙️ run/setup.sh. This will download supplementary model data that is necessary to run the code. These are also necessary to run the tests.

If you wish to hack on the code, check out 📄 doc/ide-configuration.md.

Hardware Requirements

A production-like environment requires a lot of RAM and ideally enterprise SSDs for the index, as well as some additional terabytes of slower harddrives for storing crawl data. It can be made to run on smaller hardware by limiting size of the index.

The system will definitely run on a 32 Gb machine, possibly smaller, but at that size it may not perform very well as it relies on disk caching to be fast.

A local developer's deployment is possible with much smaller hardware (and index size).

Project Structure

📁 code/ - The Source Code. See 📄 code/readme.md for a further breakdown of the structure and architecture.

📁 run/ - Scripts and files used to run the search engine locally

📁 third-party/ - Third party code

📁 doc/ - Supplementary documentation

📄 CONTRIBUTING.md - How to contribute

📄 LICENSE.md - License terms

Contact

You can email kontakt@marginalia.nu with any questions or feedback.

License

The bulk of the project is available with AGPL 3.0, with exceptions. Some parts are co-licensed under MIT, third party code may have different licenses. See the appropriate readme.md / license.md.

Versioning

The project uses modified Calendar Versioning, where the first two pairs of numbers are a year and month coinciding with the latest crawling operation, and the third number is a patch number.

            version
           --
     yy.mm.VV
     -----
     crawl

For example, 23.03.02 is a release with crawl data from March 2023 (released in May 2023). It is the second patch for the 23.02 release.

Versions with the same year and month are compatible with each other, or offer an upgrade path where the same data set can be used, but across different crawl sets data format changes may be introduced, and you're generally expected to re-crawl the data from scratch as crawler data has shelf life approximately as long as the major release cycles of this project. After about 2-3 months it gets noticeably stale with many dead links.

For development purposes, crawling is discouraged and sample data is available. See 📄 run/readme.md for more information.

Funding

Donations

Consider donating to the project.

Grants

This project was funded through the NGI0 Entrust Fund, a fund established by NLnet with financial support from the European Commission's Next Generation Internet programme, under the aegis of DG Communications Networks, Content and Technology under grant agreement No 101069594.

NLnet Foundation NGI0