(docs) Begin un-fucking the docs after refactoring

This commit is contained in:
Viktor Lofgren 2024-02-27 21:15:49 +01:00
parent c943954bb4
commit e696fd9e92
39 changed files with 107 additions and 107 deletions

View File

@ -17,14 +17,14 @@ It's well documented and these are probably the only four tasks you'll ever need
If you are not running the system via docker, you need to provide alternative connection details than
the defaults (TODO: how?).
The migration files are in [resources/db/migration](src/main/resources/db/migration). The file name convention
The migration files are in [resources/db/migration](resources/db/migration). The file name convention
incorporates the project's cal-ver versioning; and are applied in lexicographical order.
VYY_MM_v_nnn__description.sql
## Central Paths
* [migrations](src/main/resources/db/migration) - Flyway migrations
* [migrations](resources/db/migration) - Flyway migrations
## See Also

View File

@ -4,11 +4,11 @@ The domain link database contains information about links
between domains. It is a static in-memory database loaded
from a binary file.
* [DomainLinkDb](src/main/java/nu/marginalia/linkdb/DomainLinkDb.java)
* * [FileDomainLinkDb](src/main/java/nu/marginalia/linkdb/FileDomainLinkDb.java)
* * [SqlDomainLinkDb](src/main/java/nu/marginalia/linkdb/SqlDomainLinkDb.java)
* [DomainLinkDbWriter](src/main/java/nu/marginalia/linkdb/DomainLinkDbWriter.java)
* [DomainLinkDbLoader](src/main/java/nu/marginalia/linkdb/DomainLinkDbLoader.java)
* [DomainLinkDb](java/nu/marginalia/linkdb/DomainLinkDb.java)
* * [FileDomainLinkDb](java/nu/marginalia/linkdb/FileDomainLinkDb.java)
* * [SqlDomainLinkDb](java/nu/marginalia/linkdb/SqlDomainLinkDb.java)
* [DomainLinkDbWriter](java/nu/marginalia/linkdb/DomainLinkDbWriter.java)
* [DomainLinkDbLoader](java/nu/marginalia/linkdb/DomainLinkDbLoader.java)
## Document Database
@ -21,8 +21,8 @@ is not in the MariaDB database is that this would make updates to
this information take effect in production immediately, even before
the information was searchable.
* [DocumentLinkDbWriter](src/main/java/nu/marginalia/linkdb/DocumentDbWriter.java)
* [DocumentLinkDbLoader](src/main/java/nu/marginalia/linkdb/DocumentDbReader.java)
* [DocumentLinkDbWriter](java/nu/marginalia/linkdb/DocumentDbWriter.java)
* [DocumentLinkDbLoader](java/nu/marginalia/linkdb/DocumentDbReader.java)
## See Also

View File

@ -4,9 +4,9 @@ This package contains common models to the search engine
## Central Classes
* [EdgeDomain](src/main/java/nu/marginalia/model/EdgeDomain.java)
* [EdgeUrl](src/main/java/nu/marginalia/model/EdgeUrl.java)
* [DocumentMetadata](src/main/java/nu/marginalia/model/idx/DocumentMetadata.java)
* [DocumentFlags](src/main/java/nu/marginalia/model/idx/DocumentFlags.java)
* [WordMetadata](src/main/java/nu/marginalia/model/idx/WordMetadata.java)
* [WordFlags](src/main/java/nu/marginalia/model/idx/WordFlags.java)
* [EdgeDomain](java/nu/marginalia/model/EdgeDomain.java)
* [EdgeUrl](java/nu/marginalia/model/EdgeUrl.java)
* [DocumentMetadata](java/nu/marginalia/model/idx/DocumentMetadata.java)
* [DocumentFlags](java/nu/marginalia/model/idx/DocumentFlags.java)
* [WordMetadata](java/nu/marginalia/model/idx/WordMetadata.java)
* [WordFlags](java/nu/marginalia/model/idx/WordFlags.java)

View File

@ -4,4 +4,4 @@ Renders handlebar-style templates for the user-facing services.
## Central Classes
* [Mustache Renderer](src/main/java/nu/marginalia/renderer/MustacheRenderer.java)
* [Mustache Renderer](java/nu/marginalia/renderer/MustacheRenderer.java)

View File

@ -71,11 +71,11 @@ lifecycle, listen to lifecycle notifications and so on.
## gRPC Channel Pool
From the [GrpcChannelPoolFactory](src/main/java/nu/marginalia/service/client/GrpcChannelPoolFactory.java), two types of channel pools can be created
From the [GrpcChannelPoolFactory](java/nu/marginalia/service/client/GrpcChannelPoolFactory.java), two types of channel pools can be created
that are aware of the service registry:
* [GrpcMultiNodeChannelPool](src/main/java/nu/marginalia/service/client/GrpcMultiNodeChannelPool.java) - This pool permits 1-n style communication with partitioned services
* [GrpcSingleNodeChannelPool](src/main/java/nu/marginalia/service/client/GrpcSingleNodeChannelPool.java) - This pool permits 1-1 style communication with non-partitioned services.
* [GrpcMultiNodeChannelPool](java/nu/marginalia/service/client/GrpcMultiNodeChannelPool.java) - This pool permits 1-n style communication with partitioned services
* [GrpcSingleNodeChannelPool](java/nu/marginalia/service/client/GrpcSingleNodeChannelPool.java) - This pool permits 1-1 style communication with non-partitioned services.
if multiple instances are running, it will use one of them and fall back
to another if the first is not available.
@ -145,5 +145,5 @@ Future<List<Response>> response = channelPool
### Central Classes
* [ServiceRegistryIf](src/main/java/nu/marginalia/service/discovery/ServiceRegistryIf.java)
* [ZkServiceRegistry](src/main/java/nu/marginalia/service/discovery/ZkServiceRegistry.java)
* [ServiceRegistryIf](java/nu/marginalia/service/discovery/ServiceRegistryIf.java)
* [ZkServiceRegistry](java/nu/marginalia/service/discovery/ZkServiceRegistry.java)

View File

@ -50,5 +50,5 @@ Further the new service needs to be added to the `ServiceId` enum in [service-di
## Central Classes
* [MainClass](src/main/java/nu/marginalia/service/MainClass.java) bootstraps all executables
* [Service](src/main/java/nu/marginalia/service/server/Service.java) base class for all services.
* [MainClass](java/nu/marginalia/service/MainClass.java) bootstraps all executables
* [Service](java/nu/marginalia/service/server/Service.java) base class for all services.

View File

@ -5,4 +5,4 @@ uses it to identify if a document has ads.
## Central Classes
* [AdblockSimulator](src/main/java/nu/marginalia/adblock/AdblockSimulator.java)
* [AdblockSimulator](java/nu/marginalia/adblock/AdblockSimulator.java)

View File

@ -2,6 +2,6 @@ Contains converter-*like* extraction jobs that operate on crawled data to produc
## Important classes
* [AtagExporter](src/main/java/nu/marginalia/extractor/AtagExporter.java) - extracts anchor texts from the crawled data.
* [FeedExporter](src/main/java/nu/marginalia/extractor/FeedExporter.java) - tries to find RSS/Atom feeds within the crawled data.
* [TermFrequencyExporter](src/main/java/nu/marginalia/extractor/TermFrequencyExporter.java) - exports the 'TF' part of TF-IDF.
* [AtagExporter](java/nu/marginalia/extractor/AtagExporter.java) - extracts anchor texts from the crawled data.
* [FeedExporter](java/nu/marginalia/extractor/FeedExporter.java) - tries to find RSS/Atom feeds within the crawled data.
* [TermFrequencyExporter](java/nu/marginalia/extractor/TermFrequencyExporter.java) - exports the 'TF' part of TF-IDF.

View File

@ -6,8 +6,8 @@ functions based on [POS tags](https://www.ling.upenn.edu/courses/Fall_2003/ling0
## Central Classes
* [DocumentKeywordExtractor](src/main/java/nu/marginalia/keyword/DocumentKeywordExtractor.java)
* [KeywordMetadata](src/main/java/nu/marginalia/keyword/KeywordMetadata.java)
* [DocumentKeywordExtractor](java/nu/marginalia/keyword/DocumentKeywordExtractor.java)
* [KeywordMetadata](java/nu/marginalia/keyword/KeywordMetadata.java)
## See Also

View File

@ -4,4 +4,4 @@ Contains advanced haruspicy for figuring out when a document was published.
## Central Classes
* [PubDateSniffer](src/main/java/nu/marginalia/pubdate/PubDateSniffer.java)
* [PubDateSniffer](java/nu/marginalia/pubdate/PubDateSniffer.java)

View File

@ -21,5 +21,5 @@ order of a 100,000,000 documents with a time budget of a couple of hours.
## Central Classes
* [SummaryExtractor](src/main/java/nu/marginalia/summary/SummaryExtractor.java)
* [SummaryExtractor](java/nu/marginalia/summary/SummaryExtractor.java)

View File

@ -4,6 +4,6 @@ Contains tools for blocking links from crawling.
## Central Classes
* [GeoIpBlocklist](src/main/java/nu/marginalia/ip_blocklist/GeoIpBlocklist.java) - country blocking
* [IpBlocklist](src/main/java/nu/marginalia/ip_blocklist/IpBlockList.java) - CIDR-based blocking
* [UrlBlocklist](src/main/java/nu/marginalia/ip_blocklist/UrlBlocklist.java) - URL pattern blocking
* [GeoIpBlocklist](java/nu/marginalia/ip_blocklist/GeoIpBlocklist.java) - country blocking
* [IpBlocklist](java/nu/marginalia/ip_blocklist/IpBlockList.java) - CIDR-based blocking
* [UrlBlocklist](java/nu/marginalia/ip_blocklist/UrlBlocklist.java) - URL pattern blocking

View File

@ -5,4 +5,4 @@ pathological links, etc.
## Central Classes
* [LinkParser](src/main/java/nu/marginalia/link_parser/LinkParser.java)
* [LinkParser](java/nu/marginalia/link_parser/LinkParser.java)

View File

@ -8,8 +8,8 @@ The `id` file contains a list of sorted document ids, and the `data` file contai
metadata for each document id, in the same order as the `id` file, with a fixed
size record containing data associated with each document id.
Each record contains a binary encoded [DocumentMetadata](../../common/model/src/main/java/nu/marginalia/model/idx/DocumentMetadata.java) object,
as well as a [HtmlFeatures](../../common/model/src/main/java/nu/marginalia/model/crawl/HtmlFeature.java) bitmask.
Each record contains a binary encoded [DocumentMetadata](../../common/model/java/nu/marginalia/model/idx/DocumentMetadata.java) object,
as well as a [HtmlFeatures](../../common/model/java/nu/marginalia/model/crawl/HtmlFeature.java) bitmask.
Unlike the reverse index, the forward index is not split into two tiers, and the data is in the same
order as it is in the source data, and the cardinality of the document IDs is assumed to fit in memory,
@ -17,5 +17,5 @@ so it's relatively easy to construct.
## Central Classes
* [ForwardIndexConverter](src/main/java/nu/marginalia/index/forward/ForwardIndexConverter.java) constructs the index.
* [ForwardIndexReader](src/main/java/nu/marginalia/index/forward/ForwardIndexReader.java) interrogates the index.
* [ForwardIndexConverter](java/nu/marginalia/index/forward/ForwardIndexConverter.java) constructs the index.
* [ForwardIndexReader](java/nu/marginalia/index/forward/ForwardIndexReader.java) interrogates the index.

View File

@ -16,9 +16,9 @@ are designed to handle this transparently via their *Paging* implementation.
## Central Classes
### Model
* [IndexJournalEntry](src/main/java/nu/marginalia/index/journal/model/IndexJournalEntry.java)
* [IndexJournalEntryHeader](src/main/java/nu/marginalia/index/journal/model/IndexJournalEntryHeader.java)
* [IndexJournalEntryData](src/main/java/nu/marginalia/index/journal/model/IndexJournalEntryData.java)
* [IndexJournalEntry](java/nu/marginalia/index/journal/model/IndexJournalEntry.java)
* [IndexJournalEntryHeader](java/nu/marginalia/index/journal/model/IndexJournalEntryHeader.java)
* [IndexJournalEntryData](java/nu/marginalia/index/journal/model/IndexJournalEntryData.java)
### I/O
* [IndexJournalReader](src/main/java/nu/marginalia/index/journal/reader/IndexJournalReader.java)
* [IndexJournalWriter](src/main/java/nu/marginalia/index/journal/writer/IndexJournalWriter.java)
* [IndexJournalReader](java/nu/marginalia/index/journal/reader/IndexJournalReader.java)
* [IndexJournalWriter](java/nu/marginalia/index/journal/writer/IndexJournalWriter.java)

View File

@ -34,9 +34,9 @@ to form a finalized reverse index.
![Illustration of the data layout of the finalized index](index.svg)
## Central Classes
* [ReversePreindex](src/main/java/nu/marginalia/index/construction/ReversePreindex.java) intermediate reverse index state.
* [ReverseIndexConstructor](src/main/java/nu/marginalia/index/construction/ReverseIndexConstructor.java) constructs the index.
* [ReverseIndexReader](src/main/java/nu/marginalia/index/ReverseIndexReader.java) interrogates the index.
* [ReversePreindex](java/nu/marginalia/index/construction/ReversePreindex.java) intermediate reverse index state.
* [ReverseIndexConstructor](java/nu/marginalia/index/construction/ReverseIndexConstructor.java) constructs the index.
* [ReverseIndexReader](java/nu/marginalia/index/ReverseIndexReader.java) interrogates the index.
## See Also

View File

@ -12,11 +12,11 @@ interfaces are implemented within the index-service module.
## Central Classes
* [IndexQuery](src/main/java/nu/marginalia/index/query/IndexQuery.java)
* [query/filter](src/main/java/nu/marginalia/index/query/filter/)
* [IndexQuery](java/nu/marginalia/index/query/IndexQuery.java)
* [query/filter](java/nu/marginalia/index/query/filter/)
## See Also
* [index/index-reverse](../index-reverse) implements many of these interfaces.
* [libraries/array](../../libraries/array)
* [libraries/array/.../LongQueryBuffer](../../libraries/array/src/main/java/nu/marginalia/array/buffer/LongQueryBuffer.java)
* [libraries/array/.../LongQueryBuffer](../../libraries/array/java/nu/marginalia/array/buffer/LongQueryBuffer.java)

View File

@ -29,7 +29,7 @@ results higher.
## Central Classes
* [ResultValuator](src/main/java/nu/marginalia/ranking/results/ResultValuator.java)
* [ResultValuator](java/nu/marginalia/ranking/results/ResultValuator.java)
---
@ -53,14 +53,14 @@ for creating a ranking algorithm that is focused on a particular segment of the
## Central Classes
* [PageRankDomainRanker](src/main/java/nu/marginalia/ranking/domains/PageRankDomainRanker.java) - Ranks domains using the
* [PageRankDomainRanker](java/nu/marginalia/ranking/domains/PageRankDomainRanker.java) - Ranks domains using the
PageRank or Personalized PageRank algorithm depending on whether a list of influence domains is provided.
### Data sources
* [LinkGraphSource](src/main/java/nu/marginalia/ranking/domains/data/LinkGraphSource.java) - fetches the link graph
* [InvertedLinkGraphSource](src/main/java/nu/marginalia/ranking/domains/data/InvertedLinkGraphSource.java) - fetches the inverted link graph
* [SimilarityGraphSource](src/main/java/nu/marginalia/ranking/domains/data/SimilarityGraphSource.java) - fetches the similarity graph from the database
* [LinkGraphSource](java/nu/marginalia/ranking/domains/data/LinkGraphSource.java) - fetches the link graph
* [InvertedLinkGraphSource](java/nu/marginalia/ranking/domains/data/InvertedLinkGraphSource.java) - fetches the inverted link graph
* [SimilarityGraphSource](java/nu/marginalia/ranking/domains/data/SimilarityGraphSource.java) - fetches the similarity graph from the database
Note that the similarity graph needs to be precomputed and stored in the database for
the similarity graph source to be available.

View File

@ -32,8 +32,8 @@ try (var array = LongArrayFactory.mmapForWritingConfined(Path.of("/tmp/test"), 1
## Query Buffers
The classes [IntQueryBuffer](src/main/java/nu/marginalia/array/buffer/IntQueryBuffer.java)
and [LongQueryBuffer](src/main/java/nu/marginalia/array/buffer/LongQueryBuffer.java) are used
The classes [IntQueryBuffer](java/nu/marginalia/array/buffer/IntQueryBuffer.java)
and [LongQueryBuffer](java/nu/marginalia/array/buffer/LongQueryBuffer.java) are used
heavily in the search engine's query processing.
They are dual-pointer buffers that offer tools for filtering data.
@ -75,7 +75,7 @@ buffer.finalizeFiltering();
Especially noteworthy are the operations `retain()` and `reject()` in
[IntArraySearch](src/main/java/nu/marginalia/array/algo/IntArraySearch.java) and [LongArraySearch](src/main/java/nu/marginalia/array/algo/LongArraySearch.java).
[IntArraySearch](java/nu/marginalia/array/algo/IntArraySearch.java) and [LongArraySearch](java/nu/marginalia/array/algo/LongArraySearch.java).
They keep or remove all items in the buffer that exist in the referenced range of the array,
which must be sorted.

View File

@ -6,4 +6,4 @@ This is The Way when it comes to representing bit masks to humans.
## Central Classes
* [BrailleBlockPunchCards](src/main/java/nu/marginalia/bbpc/BrailleBlockPunchCards.java)
* [BrailleBlockPunchCards](java/nu/marginalia/bbpc/BrailleBlockPunchCards.java)

View File

@ -4,11 +4,11 @@ This package contains a small library for creating and reading a static b-tree i
Both binary indices (i.e. sets) are supported, as well as arbitrary multiple-of-keysize key-value mappings where the data is
interlaced with the keys in the leaf nodes. This is a fairly low-level datastructure.
The b-trees are specified through a [BTreeContext](src/main/java/nu/marginalia/btree/model/BTreeContext.java)
The b-trees are specified through a [BTreeContext](java/nu/marginalia/btree/model/BTreeContext.java)
which contains information about the data and index layout.
The b-trees are written through a [BTreeWriter](src/main/java/nu/marginalia/btree/BTreeWriter.java) and
read with a [BTreeReader](src/main/java/nu/marginalia/btree/BTreeReader.java).
The b-trees are written through a [BTreeWriter](java/nu/marginalia/btree/BTreeWriter.java) and
read with a [BTreeReader](java/nu/marginalia/btree/BTreeReader.java).
## Demo

View File

@ -5,7 +5,7 @@ for document deduplication. Hashes are compared using their hamming distance.
## Central Classes
* [EasyLSH](src/main/java/nu/marginalia/lsh/EasyLSH.java)
* [EasyLSH](java/nu/marginalia/lsh/EasyLSH.java)
## Demo

View File

@ -34,4 +34,4 @@ void ifTheThingDoTheThing(String str) {
## Central Classes
* [GuardedRegexFactory](src/main/java/nu/marginalia/gregex/GuardedRegexFactory.java)
* [GuardedRegexFactory](java/nu/marginalia/gregex/GuardedRegexFactory.java)

View File

@ -4,8 +4,8 @@ This library contains various tools used in language processing.
## Central Classes
* [SentenceExtractor](src/main/java/nu/marginalia/language/sentence/SentenceExtractor.java) -
Creates a [DocumentLanguageData](src/main/java/nu/marginalia/language/model/DocumentLanguageData.java) from a text, containing
* [SentenceExtractor](java/nu/marginalia/language/sentence/SentenceExtractor.java) -
Creates a [DocumentLanguageData](java/nu/marginalia/language/model/DocumentLanguageData.java) from a text, containing
its words, how they stem, POS tags, and so on.
## See Also

View File

@ -2,12 +2,12 @@ This micro-library with strategies for solving the problem of [write amplificati
writing large files out of order to disk. It offers a simple API to write data to a file in a
random order, while localizing the writes.
Several strategies are available from the [RandomFileAssembler](src/main/java/nu/marginalia/rwf/RandomFileAssembler.java)
Several strategies are available from the [RandomFileAssembler](java/nu/marginalia/rwf/RandomFileAssembler.java)
interface.
* Writing to a memory mapped file (non-solution, for small files)
* Writing to a memory buffer (for systems with enough memory)
* [RandomWriteFunnel](src/main/java/nu/marginalia/rwf/RandomWriteFunnel.java) - Not bound by memory.
* [RandomWriteFunnel](java/nu/marginalia/rwf/RandomWriteFunnel.java) - Not bound by memory.
The data is written in a native byte order.
@ -41,5 +41,5 @@ catch (IOException ex) {
## Central Classes
* [RandomFileAssembler](src/main/java/nu/marginalia/rwf/RandomFileAssembler.java)
* [RandomWriteFunnel](src/main/java/nu/marginalia/rwf/RandomWriteFunnel.java)
* [RandomFileAssembler](java/nu/marginalia/rwf/RandomFileAssembler.java)
* [RandomWriteFunnel](java/nu/marginalia/rwf/RandomWriteFunnel.java)

View File

@ -5,7 +5,7 @@ the TF-IDF score of a keyword.
## Central Classes
* [TermFrequencyDict](src/main/java/nu/marginalia/term_frequency_dict/TermFrequencyDict.java)
* [TermFrequencyDict](java/nu/marginalia/term_frequency_dict/TermFrequencyDict.java)
## See Also

View File

@ -8,9 +8,9 @@ A crawl spec is a list of domains to be crawled. It is a parquet file with the
Crawl specs are used to define the scope of a crawl in the absence of known domains.
The [CrawlSpecRecord](src/main/java/nu/marginalia/model/crawlspec/CrawlSpecRecord.java) class is
The [CrawlSpecRecord](java/nu/marginalia/model/crawlspec/CrawlSpecRecord.java) class is
used to represent a record in the crawl spec.
The [CrawlSpecRecordParquetFileReader](src/main/java/nu/marginalia/io/crawlspec/CrawlSpecRecordParquetFileReader.java)
and [CrawlSpecRecordParquetFileWriter](src/main/java/nu/marginalia/io/crawlspec/CrawlSpecRecordParquetFileWriter.java)
The [CrawlSpecRecordParquetFileReader](java/nu/marginalia/io/crawlspec/CrawlSpecRecordParquetFileReader.java)
and [CrawlSpecRecordParquetFileWriter](java/nu/marginalia/io/crawlspec/CrawlSpecRecordParquetFileWriter.java)
classes are used to read and write the crawl spec parquet files.

View File

@ -15,27 +15,27 @@ removed in the future.
## Central Classes
* [CrawledDocument](src/main/java/nu/marginalia/crawling/model/CrawledDocument.java)
* [CrawledDomain](src/main/java/nu/marginalia/crawling/model/CrawledDomain.java)
* [CrawledDocument](java/nu/marginalia/crawling/model/CrawledDocument.java)
* [CrawledDomain](java/nu/marginalia/crawling/model/CrawledDomain.java)
### Serialization
These serialization classes automatically negotiate the serialization format based on the
file extension.
Data is accessed through a [SerializableCrawlDataStream](src/main/java/nu/marginalia/crawling/io/SerializableCrawlDataStream.java),
Data is accessed through a [SerializableCrawlDataStream](java/nu/marginalia/crawling/io/SerializableCrawlDataStream.java),
which is a somewhat enhanced Iterator that can be used to read data.
* [CrawledDomainReader](src/main/java/nu/marginalia/crawling/io/CrawledDomainReader.java)
* [CrawledDomainWriter](src/main/java/nu/marginalia/crawling/io/CrawledDomainWriter.java)
* [CrawledDomainReader](java/nu/marginalia/crawling/io/CrawledDomainReader.java)
* [CrawledDomainWriter](java/nu/marginalia/crawling/io/CrawledDomainWriter.java)
### Parquet Serialization
The parquet serialization is done using the [CrawledDocumentParquetRecordFileReader](src/main/java/nu/marginalia/crawling/parquet/CrawledDocumentParquetRecordFileReader.java)
and [CrawledDocumentParquetRecordFileWriter](src/main/java/nu/marginalia/crawling/parquet/CrawledDocumentParquetRecordFileWriter.java) classes,
The parquet serialization is done using the [CrawledDocumentParquetRecordFileReader](java/nu/marginalia/crawling/parquet/CrawledDocumentParquetRecordFileReader.java)
and [CrawledDocumentParquetRecordFileWriter](java/nu/marginalia/crawling/parquet/CrawledDocumentParquetRecordFileWriter.java) classes,
which read and write parquet files respectively.
The model classes are serialized to parquet using the [CrawledDocumentParquetRecord](src/main/java/nu/marginalia/crawling/parquet/CrawledDocumentParquetRecord.java)
The model classes are serialized to parquet using the [CrawledDocumentParquetRecord](java/nu/marginalia/crawling/parquet/CrawledDocumentParquetRecord.java)
The record has the following fields:

View File

@ -4,11 +4,11 @@ reading and writing parquet files with the output from the
Main models:
* [DocumentRecord](src/main/java/nu/marginalia/model/processed/DocumentRecord.java)
* * [DocumentRecordKeywordsProjection](src/main/java/nu/marginalia/model/processed/DocumentRecordKeywordsProjection.java)
* * [DocumentRecordMetadataProjection](src/main/java/nu/marginalia/model/processed/DocumentRecordMetadataProjection.java)
* [DomainLinkRecord](src/main/java/nu/marginalia/model/processed/DomainLinkRecord.java)
* [DomainRecord](src/main/java/nu/marginalia/model/processed/DomainRecord.java)
* [DocumentRecord](java/nu/marginalia/model/processed/DocumentRecord.java)
* * [DocumentRecordKeywordsProjection](java/nu/marginalia/model/processed/DocumentRecordKeywordsProjection.java)
* * [DocumentRecordMetadataProjection](java/nu/marginalia/model/processed/DocumentRecordMetadataProjection.java)
* [DomainLinkRecord](java/nu/marginalia/model/processed/DomainLinkRecord.java)
* [DomainRecord](java/nu/marginalia/model/processed/DomainRecord.java)
Since parquet is a column based format, some of the readable models are projections
that only read parts of the input file.

View File

@ -38,16 +38,16 @@ https://www.marginalia.nu/log/93_atags/
## Central Classes
* [ConverterMain](src/main/java/nu/marginalia/converting/ConverterMain.java) orchestrates the conversion process.
* [DocumentProcessor](src/main/java/nu/marginalia/converting/processor/DocumentProcessor.java) converts a single document.
* - [HtmlDocumentProcessorPlugin](src/main/java/nu/marginalia/converting/processor/plugin/HtmlDocumentProcessorPlugin.java)
* [ConverterMain](java/nu/marginalia/converting/ConverterMain.java) orchestrates the conversion process.
* [DocumentProcessor](java/nu/marginalia/converting/processor/DocumentProcessor.java) converts a single document.
* - [HtmlDocumentProcessorPlugin](java/nu/marginalia/converting/processor/plugin/HtmlDocumentProcessorPlugin.java)
has HTML-specific logic related to a document, keywords and identifies features such as whether it has javascript.
* * - [HtmlProcessorSpecializations](src/main/java/nu/marginalia/converting/processor/plugin/specialization/HtmlProcessorSpecializations.java)
* * - [XenForoSpecialization](src/main/java/nu/marginalia/converting/processor/plugin/specialization/XenForoSpecialization.java) ...
* - [PlainTextDocumentProcessorPlugin](src/main/java/nu/marginalia/converting/processor/plugin/PlainTextDocumentProcessorPlugin.java)
* * - [HtmlProcessorSpecializations](java/nu/marginalia/converting/processor/plugin/specialization/HtmlProcessorSpecializations.java)
* * - [XenForoSpecialization](java/nu/marginalia/converting/processor/plugin/specialization/XenForoSpecialization.java) ...
* - [PlainTextDocumentProcessorPlugin](java/nu/marginalia/converting/processor/plugin/PlainTextDocumentProcessorPlugin.java)
has plain text-specific logic related to a document...
* [DomainProcessor](src/main/java/nu/marginalia/converting/processor/DomainProcessor.java) converts each document and
* [DomainProcessor](java/nu/marginalia/converting/processor/DomainProcessor.java) converts each document and
generates domain-wide metadata such as link graphs.
## See Also

View File

@ -31,10 +31,10 @@ On top of organic links, the crawler can use sitemaps and rss-feeds to discover
## Central Classes
* [CrawlerMain](src/main/java/nu/marginalia/crawl/CrawlerMain.java) orchestrates the crawling.
* [CrawlerRetreiver](src/main/java/nu/marginalia/crawl/retreival/CrawlerRetreiver.java)
* [CrawlerMain](java/nu/marginalia/crawl/CrawlerMain.java) orchestrates the crawling.
* [CrawlerRetreiver](java/nu/marginalia/crawl/retreival/CrawlerRetreiver.java)
visits known addresses from a domain and downloads each document.
* [HttpFetcher](src/main/java/nu/marginalia/crawl/retreival/fetcher/HttpFetcherImpl.java)
* [HttpFetcher](java/nu/marginalia/crawl/retreival/fetcher/HttpFetcherImpl.java)
fetches URLs.
## See Also

View File

@ -16,5 +16,5 @@ This is a very light-weight module that delegates the actual work to the modules
Their respective readme files contain more information about the indexes themselves
and how they are constructed.
The process is glued together within [IndexConstructorMain](src/main/java/nu/marginalia/index/IndexConstructorMain.java),
The process is glued together within [IndexConstructorMain](java/nu/marginalia/index/IndexConstructorMain.java),
which is the only class of interest in this module.

View File

@ -6,4 +6,4 @@ the index-service.
## Central Classes
* [LoaderMain](src/main/java/nu/marginalia/loading/LoaderMain.java) main class.
* [LoaderMain](java/nu/marginalia/loading/LoaderMain.java) main class.

View File

@ -4,4 +4,4 @@ The API service acts as a gateway for public API requests, it deals with API key
## Central Classes
* [ApiService](src/main/java/nu/marginalia/api/ApiService.java) handles REST requests and delegates to the appropriate handling classes.
* [ApiService](java/nu/marginalia/api/ApiService.java) handles REST requests and delegates to the appropriate handling classes.

View File

@ -14,13 +14,13 @@ to the user.
## Central classes
* [SearchService](src/main/java/nu/marginalia/search/SearchService.java) receives requests and delegates to the
* [SearchService](java/nu/marginalia/search/SearchService.java) receives requests and delegates to the
appropriate services.
* [CommandEvaluator](src/main/java/nu/marginalia/search/command/CommandEvaluator.java) interprets a user query and acts
* [CommandEvaluator](java/nu/marginalia/search/command/CommandEvaluator.java) interprets a user query and acts
upon it, dealing with special operations like `browse:` or `site:`.
* [SearchQueryIndexService](src/main/java/nu/marginalia/search/svc/SearchQueryIndexService.java) passes a parsed search query to the index service, and
* [SearchQueryIndexService](java/nu/marginalia/search/svc/SearchQueryIndexService.java) passes a parsed search query to the index service, and
then decorates the search results so that they can be rendered.
## See Also

View File

@ -4,4 +4,4 @@ The assistant service helps the search service by offering various peripheral fu
## Central Classes
* [AssistantService](src/main/java/nu/marginalia/assistant/AssistantService.java) handles REST requests and delegates to the appropriate handling classes.
* [AssistantService](java/nu/marginalia/assistant/AssistantService.java) handles REST requests and delegates to the appropriate handling classes.

View File

@ -15,7 +15,7 @@ Conceptually the application is broken into three parts:
## Central Classes
* [ControlService](src/main/java/nu/marginalia/control/ControlService.java)
* [ControlService](java/nu/marginalia/control/ControlService.java)
## See Also

View File

@ -9,7 +9,7 @@ much of the executor's functionality.
## Central Classes
* [ExecutorActorControlService](src/main/java/nu/marginalia/actor/ExecutorActorControlService.java)
* [ExecutorActorControlService](java/nu/marginalia/actor/ExecutorActorControlService.java)
## See Also

View File

@ -15,7 +15,7 @@ The web interface also offers a JSON API for machine-based queries.
## Central Classes
This module is almost entirely boilerplate, except the [QueryBasicInterface](src/main/java/nu/marginalia/query/QueryBasicInterface.java)
This module is almost entirely boilerplate, except the [QueryBasicInterface](java/nu/marginalia/query/QueryBasicInterface.java)
class, which offers a REST API for querying the index.
Much of the guts of the query service are in the [query-service](../../functions/search-query)