CatgirlIntelligenceAgency/run/readme.md

78 lines
2.6 KiB
Markdown

# Run
When developing locally, this directory will contain run-time data required for
the search engine. In a clean check-out, it only contains the tools required to
bootstrap this directory structure.
## Requirements
While the system is designed to run bare metal in production,
for local development, you're strongly encouraged to use docker
or podman. These are a bit of a pain to install, but if you follow
[this guide](https://docs.docker.com/engine/install/ubuntu/#install-using-the-repository)
you're on the right track.
## Set up
To go from a clean check out of the git repo to a running search engine,
follow these steps. You're assumed to sit in the project root the whole time.
1. Run the one-time setup, it will create the
basic runtime directory structure and download some models and data that doesn't
come with the git repo.
```
$ run/setup.sh
```
2. Compile the project and build docker images
```
$ ./gradlew assemble docker
```
3. Download a sample of crawl data, process it and stick the metadata
into the database. The data is only downloaded once. Grab a cup of coffee, this takes a few minutes.
This needs to be done whenever the crawler or processor has changed.
```
$ docker-compose up -d mariadb
$ run/reconvert.sh
```
4. Bring the system online. We'll run it in the foreground in the terminal this time
because it's educational to see the logs. Add `-d` to run in the background.
```
$ docker-compose up
```
5. Since we've just processed new crawl data, the system needs to construct static
indexes. Wait for the line 'Auto-conversion finished!'
When all is done, it should be possible to visit
[http://localhost:8080](http://localhost:8080) and try a few searches!
## Other Crawl Data
By default, `reconvert.sh` will load the medium dataset. This is appropriate for a demo,
but other datasets also exist.
| Set | Description |
|-----|----------------------------------------------------------------------------|
| s | 1000 domains, suitable for low-end machines |
| m | 2000 domains |
| l | 5000 domains |
| xl | 50,000 domains, basically pre-prod.<br><b>Warning</b>: 5h+ processing time |
To switch datasets, run e.g.
```shell
$ docker-compose up -d mariadb
$ ./run/reconvert.sh l
```
## Experiment Runner
The script `experiment.sh` is a launcher for the experiment runner, which is useful when
evaluating new algorithms in processing crawl data.