CatgirlIntelligenceAgency/run/readme.md

# Run

When developing locally, this directory will contain run-time data required for
the search engine. In a clean check-out, it only contains the tools required to 
bootstrap this directory structure.

## Requirements
While the system is designed to run bare metal in production,
for local development, you're strongly encouraged to use docker
or podman. These are a bit of a pain to install, but if you follow
[this guide](https://docs.docker.com/engine/install/ubuntu/#install-using-the-repository) 
you're on the right track.

## Set up
To go from a clean check out of the git repo to a running search engine,
follow these steps. You're assumed to sit in the project root the whole time.

1. Run the one-time setup, it will create the
basic runtime directory structure and download some models and data that doesn't
come with the git repo.

```
$ run/setup.sh
```

2. Compile the project and build docker images

```
$ ./gradlew assemble docker
```

3. Download a sample of crawl data, process it and stick the metadata
into the database. The data is only downloaded once. Grab a cup of coffee, this takes a few minutes. 
This needs to be done whenever the crawler or processor has changed. 

```
$ docker-compose up -d mariadb
$ run/reconvert.sh
```

4. Bring the system online. We'll run it in the foreground in the terminal this time
because it's educational to see the logs. Add `-d` to run in the background.


```
$ docker-compose up
```

5. Since we've just processed new crawl data, the system needs to construct static
indexes. Wait for the line 'Auto-conversion finished!'  

When all is done, it should be possible to visit
[http://localhost:8080](http://localhost:8080) and try a few searches!


## Other Crawl Data

By default, `reconvert.sh` will load the medium dataset. This is appropriate for a demo,
but other datasets also exist.

| Set | Description                                                                |
|-----|----------------------------------------------------------------------------|
| s   | 1000 domains, suitable for low-end machines                                |
| m   | 2000 domains                                                               |
| l   | 5000 domains                                                               |
| xl  | 50,000 domains, basically pre-prod.<br><b>Warning</b>: 5h+ processing time |

To switch datasets, run e.g. 

```shell
$ docker-compose up -d mariadb
$ ./run/reconvert.sh l
```

## Experiment Runner

The script `experiment.sh` is a launcher for the experiment runner, which is useful when 
evaluating new algorithms in processing crawl data.
WIP run and setup 2023-03-04 14:35:50 +01:00			`# Run`

			`When developing locally, this directory will contain run-time data required for`
			`the search engine. In a clean check-out, it only contains the tools required to`
			`bootstrap this directory structure.`

More documentation... 2023-03-06 18:45:01 +01:00			`## Requirements`
WIP run and setup 2023-03-04 14:35:50 +01:00			`While the system is designed to run bare metal in production,`
			`for local development, you're strongly encouraged to use docker`
More documentation... 2023-03-06 18:45:01 +01:00			`or podman. These are a bit of a pain to install, but if you follow`
			`[this guide](https://docs.docker.com/engine/install/ubuntu/#install-using-the-repository)`
			`you're on the right track.`
WIP run and setup 2023-03-04 14:35:50 +01:00
More documentation... 2023-03-06 18:45:01 +01:00			`## Set up`
Setup readme 2023-03-04 16:12:37 +01:00			`To go from a clean check out of the git repo to a running search engine,`
			`follow these steps. You're assumed to sit in the project root the whole time.`
WIP run and setup 2023-03-04 14:35:50 +01:00
Setup readme 2023-03-04 16:12:37 +01:00			`1. Run the one-time setup, it will create the`
Setup readme 2023-03-04 16:14:03 +01:00			`basic runtime directory structure and download some models and data that doesn't`
			`come with the git repo.`

WIP run and setup 2023-03-04 14:35:50 +01:00			```
			`$ run/setup.sh`
Setup readme 2023-03-04 16:06:36 +01:00			```

Setup readme 2023-03-04 16:12:37 +01:00			`2. Compile the project and build docker images`
WIP run and setup 2023-03-04 14:35:50 +01:00
Setup readme 2023-03-04 16:06:36 +01:00			```
WIP run and setup 2023-03-04 15:17:02 +01:00			`$ ./gradlew assemble docker`
Setup readme 2023-03-04 16:06:36 +01:00			```
WIP run and setup 2023-03-04 15:17:02 +01:00
Setup readme 2023-03-04 16:12:37 +01:00			`3. Download a sample of crawl data, process it and stick the metadata`
			`into the database. The data is only downloaded once. Grab a cup of coffee, this takes a few minutes.`
			`This needs to be done whenever the crawler or processor has changed.`
Setup readme 2023-03-04 16:06:36 +01:00
			```
			`$ docker-compose up -d mariadb`
WIP run and setup 2023-03-04 14:35:50 +01:00			`$ run/reconvert.sh`
Setup readme 2023-03-04 16:06:36 +01:00			```

Setup readme 2023-03-04 16:12:37 +01:00			`4. Bring the system online. We'll run it in the foreground in the terminal this time`
			because it's educational to see the logs. Add `-d` to run in the background.

WIP run and setup 2023-03-04 14:35:50 +01:00
Setup readme 2023-03-04 16:06:36 +01:00			```
WIP run and setup 2023-03-04 14:35:50 +01:00			`$ docker-compose up`
			```

More documentation... 2023-03-06 18:45:01 +01:00			`5. Since we've just processed new crawl data, the system needs to construct static`
Setup readme 2023-03-04 16:12:37 +01:00			`indexes. Wait for the line 'Auto-conversion finished!'`
Automatic conversion at start-up given correct conditions 2023-03-04 16:02:02 +01:00
Setup readme 2023-03-04 16:06:36 +01:00			`When all is done, it should be possible to visit`
Add experiment runner tool and got rid of experiments module in processes. 2023-03-28 16:58:46 +02:00			`[http://localhost:8080](http://localhost:8080) and try a few searches!`


			`## Other Crawl Data`

			By default, `reconvert.sh` will load the medium dataset. This is appropriate for a demo,
			`but other datasets also exist.`

			`\| Set \| Description \|`
			`\|-----\|----------------------------------------------------------------------------\|`
			`\| s \| 1000 domains, suitable for low-end machines \|`
			`\| m \| 2000 domains \|`
			`\| l \| 5000 domains \|`
			`\| xl \| 50,000 domains, basically pre-prod.<br><b>Warning</b>: 5h+ processing time \|`

			`To switch datasets, run e.g.`

			```shell
			`$ docker-compose up -d mariadb`
			`$ ./run/reconvert.sh l`
			```

			`## Experiment Runner`

			The script `experiment.sh` is a launcher for the experiment runner, which is useful when
			`evaluating new algorithms in processing crawl data.`