ed373eef61
* Break apart CrawlerRetreiver * Break apart HttpFetcher into an interface and impl for testing sanity * Add special logic for Lemmy, Mediawiki and Discourse to not waste requests on paths that aren't interesting. |
||
---|---|---|
.github | ||
code | ||
doc | ||
gradle/wrapper | ||
run | ||
third-party | ||
tools | ||
.gitignore | ||
build.gradle | ||
CONTRIBUTING.md | ||
docker-compose.yml | ||
docker-service.gradle | ||
gradle.properties | ||
gradlew | ||
gradlew.bat | ||
LICENSE.md | ||
README.md | ||
settings.gradle |
Marginalia Search
This is the source code for Marginalia Search.
The aim of the project is to develop new and alternative discovery methods for the Internet. It's an experimental workshop as much as it is a public service, the overarching goal is to elevate the more human, non-commercial sides of the Internet. A side-goal is to do this without requiring datacenters and expensive enterprise hardware, to run this operation on affordable hardware.
Set up
Start by running ⚙️ run/setup.sh. This will download supplementary model data that is necessary to run the code. These are also necessary to run the tests.
To set up a local test environment, follow the instructions in 📄 run/readme.md!
Hardware Requirements
A production-like environment requires at least 128 Gb of RAM and ideally 2 Tb+ of enterprise grade SSD storage, as well as some additional terabytes of slower harddrives for storing crawl data. It can be made to run on smaller hardware by limiting size of the index.
A local developer's deployment is possible with much smaller hardware (and index size).
Project Structure
📁 code/ - The Source Code. See 📄 code/readme.md for a further breakdown of the structure and architecture.
📁 run/ - Scripts and files used to run the search engine locally
📁 third-party/ - Third party code
📁 doc/ - Supplementary documentation
📄 CONTRIBUTING.md - How to contribute
📄 LICENSE.md - License terms
Supporting
Consider supporting this project.
Contact
You can email kontakt@marginalia.nu with any questions or feedback.
License
The bulk of the project is available with AGPL 3.0, with exceptions. Some parts are co-licensed under MIT, third party code may have different licenses. See the appropriate readme.md / license.md.