(doc) Update docs

This commit is contained in:
Viktor Lofgren 2024-02-06 12:41:28 +01:00
parent 54330b9921
commit 92049ba8e4
4 changed files with 8 additions and 22 deletions

View File

@ -45,10 +45,7 @@ the executor service, which is controlled by the control service.
* * [crawling-process](processes/crawling-process)
* * [converting-process](processes/converting-process)
* * [loading-process](processes/loading-process)
#### Tools
* * [term-frequency-extractor](tools/term-frequency-extractor)
* * [index-constructor-process](processes/index-constructor-process)
### Features

View File

@ -1,9 +1,13 @@
# Index Service
The index service is a partitioned service that knows which document contains which keywords.
The index service is a partitioned service that knows which document contains which keywords.
![image](../../../doc/diagram/index-service-map.svg)
It is the service that most directly executes a search query. It does this by
evaluating a low-level query, and then using the index to find the documents
that match the query, finally ranking the results and picking the best matches.
## Central Classes
* [IndexService](src/main/java/nu/marginalia/index/IndexService.java) is the REST entry point that the internal API talks to.

View File

@ -1,4 +1,5 @@
The query service parses search queries and delegates work to the index service.
The query service parses search queries and delegates work to the
index services.
The [index-service](../index-service) speaks a lower level query specification language
that is difficult to build an application out of. The query service exists as an interpreter

View File

@ -1,16 +0,0 @@
# Term Frequency Extractor
Generates a term frequency dictionary file from a batch of crawl data.
Usage:
```shell
PATH_TO_SAMPLES=run/samples/crawl-s
export JAVA_OPTS=-Dcrawl.rootDirRewrite=/crawl:${PATH_TO_SAMPLES}
term-frequency-extractor ${PATH_TO_SAMPLES}/plan.yaml out.dat
```
## See Also
* [libraries/term-frequency-dict](../../libraries/term-frequency-dict)