Clean up docs

This commit is contained in:
Viktor Lofgren 2024-02-22 18:18:58 +01:00
parent f8e7f75831
commit 4740156cfa
3 changed files with 28 additions and 40 deletions

View File

@ -8,7 +8,6 @@ import nu.marginalia.service.discovery.monitor.ServiceChangeMonitor;
import nu.marginalia.service.discovery.property.PartitionTraits;
import nu.marginalia.service.discovery.property.ServiceEndpoint.InstanceAddress;
import nu.marginalia.service.discovery.property.ServiceKey;
import org.jetbrains.annotations.NotNull;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

View File

@ -1,30 +1,41 @@
# Index
These are components that offer functionality for the [index-service](../../services-core/index-service).
This module contains the components that make up the search index.
It exposes an API for querying the index, and contains the logic
for ranking search results. It does not parse the query, that is
the responsibility of the [search-query](../functions/search-query) module.
## Indexes
There are two indexes with accompanying tools for constructing them.
* [index-reverse](index-reverse/) is code for `word->document` indexes. There are two such indexes, one containing only document-word pairs that are flagged as important, e.g. the word appears in the title or has a high TF-IDF. This allows good results to be discovered quickly without having to sift through ten thousand bad ones first.
* [index-reverse](reverse-index/) is code for `word->document` indexes. There are two such indexes, one containing only document-word pairs that are flagged as important, e.g. the word appears in the title or has a high TF-IDF. This allows good results to be discovered quickly without having to sift through ten thousand bad ones first.
* [index-forward](index-forward/) is the `document->word` index containing metadata about each word, such as its position. It is used after identifying candidate search results via the reverse index to fetch metadata and rank the results.
* [index-forward](forward-index/) is the `document->word` index containing metadata about each word, such as its position. It is used after identifying candidate search results via the reverse index to fetch metadata and rank the results.
These indices rely heavily on the [libraries/btree](../../libraries/btree) and [libraries/array](../../libraries/array) components.
Additionally, the [index-journal](index-journal/) contains code for constructing a journal of the index, which is used to keep the index up to date.
## Algorithms
These indices rely heavily on the [libraries/btree](../libraries/btree) and [libraries/array](../libraries/array) components.
* [domain-ranking](domain-ranking/) contains domain ranking algorithms.
* [result-ranking](result-ranking/) contains logic for ranking search results by relevance.
---
# Libraries
# Result Ranking
* [index-query](index-query/) contains structures for evaluating search queries.
* [index-journal](index-journal/) contains tools for writing and reading index data.
The module is also responsible for ranking search results, and contains various heuristics
for deciding which search results are important with regard to a query. In broad strokes [BM-25](https://nlp.stanford.edu/IR-book/html/htmledition/okapi-bm25-a-non-binary-model-1.html)
is used, with a number of additional bonuses and penalties to rank the appropriate search
results higher.
## Central Classes
* [ResultValuator](src/main/java/nu/marginalia/ranking/results/ResultValuator.java)
---
# Domain Ranking
Contains domain ranking algorithms. The domain ranking algorithms are based on
The module contains domain ranking algorithms. The domain ranking algorithms are based on
the JGraphT library.
Two principal algorithms are available, the standard PageRank algorithm,
@ -42,14 +53,14 @@ for creating a ranking algorithm that is focused on a particular segment of the
## Central Classes
* [PageRankDomainRanker](src/main/java/nu/marginalia/ranking/PageRankDomainRanker.java) - Ranks domains using the
* [PageRankDomainRanker](src/main/java/nu/marginalia/ranking/domains/PageRankDomainRanker.java) - Ranks domains using the
PageRank or Personalized PageRank algorithm depending on whether a list of influence domains is provided.
### Data sources
* [LinkGraphSource](src/main/java/nu/marginalia/ranking/data/LinkGraphSource.java) - fetches the link graph
* [InvertedLinkGraphSource](src/main/java/nu/marginalia/ranking/data/InvertedLinkGraphSource.java) - fetches the inverted link graph
* [SimilarityGraphSource](src/main/java/nu/marginalia/ranking/data/SimilarityGraphSource.java) - fetches the similarity graph from the database
* [LinkGraphSource](src/main/java/nu/marginalia/ranking/domains/data/LinkGraphSource.java) - fetches the link graph
* [InvertedLinkGraphSource](src/main/java/nu/marginalia/ranking/domains/data/InvertedLinkGraphSource.java) - fetches the inverted link graph
* [SimilarityGraphSource](src/main/java/nu/marginalia/ranking/domains/data/SimilarityGraphSource.java) - fetches the similarity graph from the database
Note that the similarity graph needs to be precomputed and stored in the database for
the similarity graph source to be available.
@ -57,14 +68,3 @@ the similarity graph source to be available.
## Useful Resources
* [The PageRank Citation Ranking: Bringing Order to the Web](http://ilpubs.stanford.edu:8090/422/1/1999-66.pdf)
# Result Ranking
Contains various heuristics for deciding which search results are important
with regard to a query. In broad strokes [BM-25](https://nlp.stanford.edu/IR-book/html/htmledition/okapi-bm25-a-non-binary-model-1.html)
is used, with a number of additional bonuses and penalties to rank the appropriate search
results higher.
## Central Classes
* [ResultValuator](src/main/java/nu/marginalia/ranking/ResultValuator.java)

View File

@ -6,17 +6,6 @@ It is the service that most directly executes a search query. It does this by
evaluating a low-level query, and then using the index to find the documents
that match the query, finally ranking the results and picking the best matches.
## Central Classes
This module only contains service boilerplate. The guts of this service are
in the [index](../../index) module.
* [IndexService](src/main/java/nu/marginalia/index/IndexService.java) is the REST entry point that the internal API talks to.
* [IndexQueryService](src/main/java/nu/marginalia/index/svc/IndexQueryService.java) executes queries.
* [SearchIndex](src/main/java/nu/marginalia/index/index/SearchIndex.java) owns the state of the index and helps with building a query strategy from parameters.
* [IndexResultValuator](src/main/java/nu/marginalia/index/results/IndexResultValuator.java) determines the best results.
## See Also
The index service relies heavily on the primitives in [features-index](../../features-index):
* [features-index/index-forward](../../features-index/index-forward/)
* [features-index/index-reverse](../../features-index/index-reverse/)
* [features-index/index-query](../../features-index/index-query)