A fork of MarginaliaSearch for Catgirl Intelligence Agency
Go to file
Viktor Lofgren 55c65f0935 Use document generator to complement the document selection.
Will let through e.g. a modern SSG in the small web filter.
2023-06-22 17:21:33 +02:00
.github Create FUNDING.yml 2023-03-28 13:13:49 +02:00
code Use document generator to complement the document selection. 2023-06-22 17:21:33 +02:00
doc Update useful-resources.md 2023-05-27 12:01:45 +02:00
gradle/wrapper Merge changes from experimental branch (#132) 2023-01-08 11:11:44 +01:00
run Tell experiment runner to only process some domains. 2023-06-20 14:14:01 +02:00
third-party Optimize RDRPosTagger to use integer comparisons instead of string comparisons. 2023-06-19 17:58:19 +02:00
tools Refactor website screenshot tool and website adjacencies calculator into code/tools. 2023-04-11 16:20:27 +02:00
.gitignore Restructuring the git repo 2023-03-04 13:19:01 +01:00
build.gradle Don't index local deployment run state in IntelliJ. 2023-03-20 17:11:39 +01:00
CONTRIBUTING.md CONTRIBUTING.md 2023-03-22 15:27:20 +01:00
docker-compose.yml Api service response cache (#16) 2023-04-22 15:42:32 +02:00
docker-service.gradle "-Dsmall-ram=TRUE" no longer does anything. Remove references to the flag, which previously reduced the memory footprint of the loader and index service. 2023-03-26 21:37:11 +02:00
gradle.properties Restructuring the git repo 2023-03-04 13:19:01 +01:00
gradlew first commit 2022-05-19 17:45:26 +02:00
gradlew.bat Merge changes from experimental branch (#132) 2023-01-08 11:11:44 +01:00
LICENSE.md Update LICENSE.md 2023-03-20 16:49:07 +01:00
README.md Update README.md 2023-04-22 21:02:25 +02:00
settings.gradle Bump dependency versions. 2023-06-20 12:03:12 +02:00

Marginalia Search

This is the source code for Marginalia Search.

The aim of the project is to develop new and alternative discovery methods for the Internet. It's an experimental workshop as much as it is a public service, the overarching goal is to elevate the more human, non-commercial sides of the Internet. A side-goal is to do this without requiring datacenters and expensive enterprise hardware, to run this operation on affordable hardware.

Set up

Start by running ⚙️ run/setup.sh. This will download supplementary model data that is necessary to run the code. These are also necessary to run the tests.

To set up a local test environment, follow the instructions in 📄 run/readme.md!

Hardware Requirements

A production-like environment requires at least 128 Gb of RAM and ideally 2 Tb+ of enterprise grade SSD storage, as well as some additional terabytes of slower harddrives for storing crawl data. It can be made to run on smaller hardware by limiting size of the index.

A local developer's deployment is possible with much smaller hardware (and index size).

Project Structure

📁 code/ - The Source Code. See 📄 code/readme.md for a further breakdown of the structure and architecture.

📁 run/ - Scripts and files used to run the search engine locally

📁 third-party/ - Third party code

📁 doc/ - Supplementary documentation

📄 CONTRIBUTING.md - How to contribute

📄 LICENSE.md - License terms

Supporting

Consider supporting this project.

Contact

You can email kontakt@marginalia.nu with any questions or feedback.

License

The bulk of the project is available with AGPL 3.0, with exceptions. Some parts are co-licensed under MIT, third party code may have different licenses. See the appropriate readme.md / license.md.