# Run When developing locally, this directory will contain run-time data required for the search engine. In a clean check-out, it only contains the tools required to bootstrap this directory structure. ## Requirements While the system is designed to run bare metal in production, for local development, you're strongly encouraged to use docker or podman. These are a bit of a pain to install, but if you follow [this guide](https://docs.docker.com/engine/install/ubuntu/#install-using-the-repository) you're on the right track. ## Set up To go from a clean check out of the git repo to a running search engine, follow these steps. You're assumed to sit in the project root the whole time. 1. Run the one-time setup, it will create the basic runtime directory structure and download some models and data that doesn't come with the git repo. ``` $ run/setup.sh ``` 2. Compile the project and build docker images ``` $ ./gradlew assemble docker ``` 3. Download a sample of crawl data, process it and stick the metadata into the database. The data is only downloaded once. Grab a cup of coffee, this takes a few minutes. This needs to be done whenever the crawler or processor has changed. ``` $ docker-compose up -d mariadb $ run/reconvert.sh ``` 4. Bring the system online. We'll run it in the foreground in the terminal this time because it's educational to see the logs. Add `-d` to run in the background. ``` $ docker-compose up ``` 5. Since we've just processed new crawl data, the system needs to construct static indexes. Wait for the line 'Auto-conversion finished!' When all is done, it should be possible to visit [http://localhost:8080](http://localhost:8080) and try a few searches! ## Other Crawl Data By default, `reconvert.sh` will load the medium dataset. This is appropriate for a demo, but other datasets also exist. | Set | Description | |-----|----------------------------------------------------------------------------| | s | 1000 domains, suitable for low-end machines | | m | 2000 domains | | l | 5000 domains | | xl | 50,000 domains, basically pre-prod.
Warning: 5h+ processing time | To switch datasets, run e.g. ```shell $ docker-compose up -d mariadb $ ./run/reconvert.sh l ``` ## Experiment Runner The script `experiment.sh` is a launcher for the experiment runner, which is useful when evaluating new algorithms in processing crawl data.