(docs) Begin un-fucking the docs after refactoring
This commit is contained in:
parent
c943954bb4
commit
e696fd9e92
@ -17,14 +17,14 @@ It's well documented and these are probably the only four tasks you'll ever need
|
||||
If you are not running the system via docker, you need to provide alternative connection details than
|
||||
the defaults (TODO: how?).
|
||||
|
||||
The migration files are in [resources/db/migration](src/main/resources/db/migration). The file name convention
|
||||
The migration files are in [resources/db/migration](resources/db/migration). The file name convention
|
||||
incorporates the project's cal-ver versioning; and are applied in lexicographical order.
|
||||
|
||||
VYY_MM_v_nnn__description.sql
|
||||
|
||||
## Central Paths
|
||||
|
||||
* [migrations](src/main/resources/db/migration) - Flyway migrations
|
||||
* [migrations](resources/db/migration) - Flyway migrations
|
||||
|
||||
## See Also
|
||||
|
||||
|
@ -4,11 +4,11 @@ The domain link database contains information about links
|
||||
between domains. It is a static in-memory database loaded
|
||||
from a binary file.
|
||||
|
||||
* [DomainLinkDb](src/main/java/nu/marginalia/linkdb/DomainLinkDb.java)
|
||||
* * [FileDomainLinkDb](src/main/java/nu/marginalia/linkdb/FileDomainLinkDb.java)
|
||||
* * [SqlDomainLinkDb](src/main/java/nu/marginalia/linkdb/SqlDomainLinkDb.java)
|
||||
* [DomainLinkDbWriter](src/main/java/nu/marginalia/linkdb/DomainLinkDbWriter.java)
|
||||
* [DomainLinkDbLoader](src/main/java/nu/marginalia/linkdb/DomainLinkDbLoader.java)
|
||||
* [DomainLinkDb](java/nu/marginalia/linkdb/DomainLinkDb.java)
|
||||
* * [FileDomainLinkDb](java/nu/marginalia/linkdb/FileDomainLinkDb.java)
|
||||
* * [SqlDomainLinkDb](java/nu/marginalia/linkdb/SqlDomainLinkDb.java)
|
||||
* [DomainLinkDbWriter](java/nu/marginalia/linkdb/DomainLinkDbWriter.java)
|
||||
* [DomainLinkDbLoader](java/nu/marginalia/linkdb/DomainLinkDbLoader.java)
|
||||
|
||||
## Document Database
|
||||
|
||||
@ -21,8 +21,8 @@ is not in the MariaDB database is that this would make updates to
|
||||
this information take effect in production immediately, even before
|
||||
the information was searchable.
|
||||
|
||||
* [DocumentLinkDbWriter](src/main/java/nu/marginalia/linkdb/DocumentDbWriter.java)
|
||||
* [DocumentLinkDbLoader](src/main/java/nu/marginalia/linkdb/DocumentDbReader.java)
|
||||
* [DocumentLinkDbWriter](java/nu/marginalia/linkdb/DocumentDbWriter.java)
|
||||
* [DocumentLinkDbLoader](java/nu/marginalia/linkdb/DocumentDbReader.java)
|
||||
|
||||
|
||||
## See Also
|
||||
|
@ -4,9 +4,9 @@ This package contains common models to the search engine
|
||||
|
||||
## Central Classes
|
||||
|
||||
* [EdgeDomain](src/main/java/nu/marginalia/model/EdgeDomain.java)
|
||||
* [EdgeUrl](src/main/java/nu/marginalia/model/EdgeUrl.java)
|
||||
* [DocumentMetadata](src/main/java/nu/marginalia/model/idx/DocumentMetadata.java)
|
||||
* [DocumentFlags](src/main/java/nu/marginalia/model/idx/DocumentFlags.java)
|
||||
* [WordMetadata](src/main/java/nu/marginalia/model/idx/WordMetadata.java)
|
||||
* [WordFlags](src/main/java/nu/marginalia/model/idx/WordFlags.java)
|
||||
* [EdgeDomain](java/nu/marginalia/model/EdgeDomain.java)
|
||||
* [EdgeUrl](java/nu/marginalia/model/EdgeUrl.java)
|
||||
* [DocumentMetadata](java/nu/marginalia/model/idx/DocumentMetadata.java)
|
||||
* [DocumentFlags](java/nu/marginalia/model/idx/DocumentFlags.java)
|
||||
* [WordMetadata](java/nu/marginalia/model/idx/WordMetadata.java)
|
||||
* [WordFlags](java/nu/marginalia/model/idx/WordFlags.java)
|
@ -4,4 +4,4 @@ Renders handlebar-style templates for the user-facing services.
|
||||
|
||||
## Central Classes
|
||||
|
||||
* [Mustache Renderer](src/main/java/nu/marginalia/renderer/MustacheRenderer.java)
|
||||
* [Mustache Renderer](java/nu/marginalia/renderer/MustacheRenderer.java)
|
@ -71,11 +71,11 @@ lifecycle, listen to lifecycle notifications and so on.
|
||||
|
||||
## gRPC Channel Pool
|
||||
|
||||
From the [GrpcChannelPoolFactory](src/main/java/nu/marginalia/service/client/GrpcChannelPoolFactory.java), two types of channel pools can be created
|
||||
From the [GrpcChannelPoolFactory](java/nu/marginalia/service/client/GrpcChannelPoolFactory.java), two types of channel pools can be created
|
||||
that are aware of the service registry:
|
||||
|
||||
* [GrpcMultiNodeChannelPool](src/main/java/nu/marginalia/service/client/GrpcMultiNodeChannelPool.java) - This pool permits 1-n style communication with partitioned services
|
||||
* [GrpcSingleNodeChannelPool](src/main/java/nu/marginalia/service/client/GrpcSingleNodeChannelPool.java) - This pool permits 1-1 style communication with non-partitioned services.
|
||||
* [GrpcMultiNodeChannelPool](java/nu/marginalia/service/client/GrpcMultiNodeChannelPool.java) - This pool permits 1-n style communication with partitioned services
|
||||
* [GrpcSingleNodeChannelPool](java/nu/marginalia/service/client/GrpcSingleNodeChannelPool.java) - This pool permits 1-1 style communication with non-partitioned services.
|
||||
if multiple instances are running, it will use one of them and fall back
|
||||
to another if the first is not available.
|
||||
|
||||
@ -145,5 +145,5 @@ Future<List<Response>> response = channelPool
|
||||
|
||||
### Central Classes
|
||||
|
||||
* [ServiceRegistryIf](src/main/java/nu/marginalia/service/discovery/ServiceRegistryIf.java)
|
||||
* [ZkServiceRegistry](src/main/java/nu/marginalia/service/discovery/ZkServiceRegistry.java)
|
||||
* [ServiceRegistryIf](java/nu/marginalia/service/discovery/ServiceRegistryIf.java)
|
||||
* [ZkServiceRegistry](java/nu/marginalia/service/discovery/ZkServiceRegistry.java)
|
@ -50,5 +50,5 @@ Further the new service needs to be added to the `ServiceId` enum in [service-di
|
||||
|
||||
## Central Classes
|
||||
|
||||
* [MainClass](src/main/java/nu/marginalia/service/MainClass.java) bootstraps all executables
|
||||
* [Service](src/main/java/nu/marginalia/service/server/Service.java) base class for all services.
|
||||
* [MainClass](java/nu/marginalia/service/MainClass.java) bootstraps all executables
|
||||
* [Service](java/nu/marginalia/service/server/Service.java) base class for all services.
|
@ -5,4 +5,4 @@ uses it to identify if a document has ads.
|
||||
|
||||
## Central Classes
|
||||
|
||||
* [AdblockSimulator](src/main/java/nu/marginalia/adblock/AdblockSimulator.java)
|
||||
* [AdblockSimulator](java/nu/marginalia/adblock/AdblockSimulator.java)
|
@ -2,6 +2,6 @@ Contains converter-*like* extraction jobs that operate on crawled data to produc
|
||||
|
||||
## Important classes
|
||||
|
||||
* [AtagExporter](src/main/java/nu/marginalia/extractor/AtagExporter.java) - extracts anchor texts from the crawled data.
|
||||
* [FeedExporter](src/main/java/nu/marginalia/extractor/FeedExporter.java) - tries to find RSS/Atom feeds within the crawled data.
|
||||
* [TermFrequencyExporter](src/main/java/nu/marginalia/extractor/TermFrequencyExporter.java) - exports the 'TF' part of TF-IDF.
|
||||
* [AtagExporter](java/nu/marginalia/extractor/AtagExporter.java) - extracts anchor texts from the crawled data.
|
||||
* [FeedExporter](java/nu/marginalia/extractor/FeedExporter.java) - tries to find RSS/Atom feeds within the crawled data.
|
||||
* [TermFrequencyExporter](java/nu/marginalia/extractor/TermFrequencyExporter.java) - exports the 'TF' part of TF-IDF.
|
@ -6,8 +6,8 @@ functions based on [POS tags](https://www.ling.upenn.edu/courses/Fall_2003/ling0
|
||||
|
||||
## Central Classes
|
||||
|
||||
* [DocumentKeywordExtractor](src/main/java/nu/marginalia/keyword/DocumentKeywordExtractor.java)
|
||||
* [KeywordMetadata](src/main/java/nu/marginalia/keyword/KeywordMetadata.java)
|
||||
* [DocumentKeywordExtractor](java/nu/marginalia/keyword/DocumentKeywordExtractor.java)
|
||||
* [KeywordMetadata](java/nu/marginalia/keyword/KeywordMetadata.java)
|
||||
|
||||
## See Also
|
||||
|
||||
|
@ -4,4 +4,4 @@ Contains advanced haruspicy for figuring out when a document was published.
|
||||
|
||||
## Central Classes
|
||||
|
||||
* [PubDateSniffer](src/main/java/nu/marginalia/pubdate/PubDateSniffer.java)
|
||||
* [PubDateSniffer](java/nu/marginalia/pubdate/PubDateSniffer.java)
|
@ -21,5 +21,5 @@ order of a 100,000,000 documents with a time budget of a couple of hours.
|
||||
|
||||
## Central Classes
|
||||
|
||||
* [SummaryExtractor](src/main/java/nu/marginalia/summary/SummaryExtractor.java)
|
||||
* [SummaryExtractor](java/nu/marginalia/summary/SummaryExtractor.java)
|
||||
|
||||
|
@ -4,6 +4,6 @@ Contains tools for blocking links from crawling.
|
||||
|
||||
## Central Classes
|
||||
|
||||
* [GeoIpBlocklist](src/main/java/nu/marginalia/ip_blocklist/GeoIpBlocklist.java) - country blocking
|
||||
* [IpBlocklist](src/main/java/nu/marginalia/ip_blocklist/IpBlockList.java) - CIDR-based blocking
|
||||
* [UrlBlocklist](src/main/java/nu/marginalia/ip_blocklist/UrlBlocklist.java) - URL pattern blocking
|
||||
* [GeoIpBlocklist](java/nu/marginalia/ip_blocklist/GeoIpBlocklist.java) - country blocking
|
||||
* [IpBlocklist](java/nu/marginalia/ip_blocklist/IpBlockList.java) - CIDR-based blocking
|
||||
* [UrlBlocklist](java/nu/marginalia/ip_blocklist/UrlBlocklist.java) - URL pattern blocking
|
@ -5,4 +5,4 @@ pathological links, etc.
|
||||
|
||||
## Central Classes
|
||||
|
||||
* [LinkParser](src/main/java/nu/marginalia/link_parser/LinkParser.java)
|
||||
* [LinkParser](java/nu/marginalia/link_parser/LinkParser.java)
|
@ -8,8 +8,8 @@ The `id` file contains a list of sorted document ids, and the `data` file contai
|
||||
metadata for each document id, in the same order as the `id` file, with a fixed
|
||||
size record containing data associated with each document id.
|
||||
|
||||
Each record contains a binary encoded [DocumentMetadata](../../common/model/src/main/java/nu/marginalia/model/idx/DocumentMetadata.java) object,
|
||||
as well as a [HtmlFeatures](../../common/model/src/main/java/nu/marginalia/model/crawl/HtmlFeature.java) bitmask.
|
||||
Each record contains a binary encoded [DocumentMetadata](../../common/model/java/nu/marginalia/model/idx/DocumentMetadata.java) object,
|
||||
as well as a [HtmlFeatures](../../common/model/java/nu/marginalia/model/crawl/HtmlFeature.java) bitmask.
|
||||
|
||||
Unlike the reverse index, the forward index is not split into two tiers, and the data is in the same
|
||||
order as it is in the source data, and the cardinality of the document IDs is assumed to fit in memory,
|
||||
@ -17,5 +17,5 @@ so it's relatively easy to construct.
|
||||
|
||||
## Central Classes
|
||||
|
||||
* [ForwardIndexConverter](src/main/java/nu/marginalia/index/forward/ForwardIndexConverter.java) constructs the index.
|
||||
* [ForwardIndexReader](src/main/java/nu/marginalia/index/forward/ForwardIndexReader.java) interrogates the index.
|
||||
* [ForwardIndexConverter](java/nu/marginalia/index/forward/ForwardIndexConverter.java) constructs the index.
|
||||
* [ForwardIndexReader](java/nu/marginalia/index/forward/ForwardIndexReader.java) interrogates the index.
|
@ -16,9 +16,9 @@ are designed to handle this transparently via their *Paging* implementation.
|
||||
## Central Classes
|
||||
|
||||
### Model
|
||||
* [IndexJournalEntry](src/main/java/nu/marginalia/index/journal/model/IndexJournalEntry.java)
|
||||
* [IndexJournalEntryHeader](src/main/java/nu/marginalia/index/journal/model/IndexJournalEntryHeader.java)
|
||||
* [IndexJournalEntryData](src/main/java/nu/marginalia/index/journal/model/IndexJournalEntryData.java)
|
||||
* [IndexJournalEntry](java/nu/marginalia/index/journal/model/IndexJournalEntry.java)
|
||||
* [IndexJournalEntryHeader](java/nu/marginalia/index/journal/model/IndexJournalEntryHeader.java)
|
||||
* [IndexJournalEntryData](java/nu/marginalia/index/journal/model/IndexJournalEntryData.java)
|
||||
### I/O
|
||||
* [IndexJournalReader](src/main/java/nu/marginalia/index/journal/reader/IndexJournalReader.java)
|
||||
* [IndexJournalWriter](src/main/java/nu/marginalia/index/journal/writer/IndexJournalWriter.java)
|
||||
* [IndexJournalReader](java/nu/marginalia/index/journal/reader/IndexJournalReader.java)
|
||||
* [IndexJournalWriter](java/nu/marginalia/index/journal/writer/IndexJournalWriter.java)
|
@ -34,9 +34,9 @@ to form a finalized reverse index.
|
||||
![Illustration of the data layout of the finalized index](index.svg)
|
||||
## Central Classes
|
||||
|
||||
* [ReversePreindex](src/main/java/nu/marginalia/index/construction/ReversePreindex.java) intermediate reverse index state.
|
||||
* [ReverseIndexConstructor](src/main/java/nu/marginalia/index/construction/ReverseIndexConstructor.java) constructs the index.
|
||||
* [ReverseIndexReader](src/main/java/nu/marginalia/index/ReverseIndexReader.java) interrogates the index.
|
||||
* [ReversePreindex](java/nu/marginalia/index/construction/ReversePreindex.java) intermediate reverse index state.
|
||||
* [ReverseIndexConstructor](java/nu/marginalia/index/construction/ReverseIndexConstructor.java) constructs the index.
|
||||
* [ReverseIndexReader](java/nu/marginalia/index/ReverseIndexReader.java) interrogates the index.
|
||||
|
||||
## See Also
|
||||
|
||||
|
@ -12,11 +12,11 @@ interfaces are implemented within the index-service module.
|
||||
|
||||
## Central Classes
|
||||
|
||||
* [IndexQuery](src/main/java/nu/marginalia/index/query/IndexQuery.java)
|
||||
* [query/filter](src/main/java/nu/marginalia/index/query/filter/)
|
||||
* [IndexQuery](java/nu/marginalia/index/query/IndexQuery.java)
|
||||
* [query/filter](java/nu/marginalia/index/query/filter/)
|
||||
|
||||
## See Also
|
||||
|
||||
* [index/index-reverse](../index-reverse) implements many of these interfaces.
|
||||
* [libraries/array](../../libraries/array)
|
||||
* [libraries/array/.../LongQueryBuffer](../../libraries/array/src/main/java/nu/marginalia/array/buffer/LongQueryBuffer.java)
|
||||
* [libraries/array/.../LongQueryBuffer](../../libraries/array/java/nu/marginalia/array/buffer/LongQueryBuffer.java)
|
@ -29,7 +29,7 @@ results higher.
|
||||
|
||||
## Central Classes
|
||||
|
||||
* [ResultValuator](src/main/java/nu/marginalia/ranking/results/ResultValuator.java)
|
||||
* [ResultValuator](java/nu/marginalia/ranking/results/ResultValuator.java)
|
||||
|
||||
---
|
||||
|
||||
@ -53,14 +53,14 @@ for creating a ranking algorithm that is focused on a particular segment of the
|
||||
|
||||
## Central Classes
|
||||
|
||||
* [PageRankDomainRanker](src/main/java/nu/marginalia/ranking/domains/PageRankDomainRanker.java) - Ranks domains using the
|
||||
* [PageRankDomainRanker](java/nu/marginalia/ranking/domains/PageRankDomainRanker.java) - Ranks domains using the
|
||||
PageRank or Personalized PageRank algorithm depending on whether a list of influence domains is provided.
|
||||
|
||||
### Data sources
|
||||
|
||||
* [LinkGraphSource](src/main/java/nu/marginalia/ranking/domains/data/LinkGraphSource.java) - fetches the link graph
|
||||
* [InvertedLinkGraphSource](src/main/java/nu/marginalia/ranking/domains/data/InvertedLinkGraphSource.java) - fetches the inverted link graph
|
||||
* [SimilarityGraphSource](src/main/java/nu/marginalia/ranking/domains/data/SimilarityGraphSource.java) - fetches the similarity graph from the database
|
||||
* [LinkGraphSource](java/nu/marginalia/ranking/domains/data/LinkGraphSource.java) - fetches the link graph
|
||||
* [InvertedLinkGraphSource](java/nu/marginalia/ranking/domains/data/InvertedLinkGraphSource.java) - fetches the inverted link graph
|
||||
* [SimilarityGraphSource](java/nu/marginalia/ranking/domains/data/SimilarityGraphSource.java) - fetches the similarity graph from the database
|
||||
|
||||
Note that the similarity graph needs to be precomputed and stored in the database for
|
||||
the similarity graph source to be available.
|
||||
|
@ -32,8 +32,8 @@ try (var array = LongArrayFactory.mmapForWritingConfined(Path.of("/tmp/test"), 1
|
||||
|
||||
## Query Buffers
|
||||
|
||||
The classes [IntQueryBuffer](src/main/java/nu/marginalia/array/buffer/IntQueryBuffer.java)
|
||||
and [LongQueryBuffer](src/main/java/nu/marginalia/array/buffer/LongQueryBuffer.java) are used
|
||||
The classes [IntQueryBuffer](java/nu/marginalia/array/buffer/IntQueryBuffer.java)
|
||||
and [LongQueryBuffer](java/nu/marginalia/array/buffer/LongQueryBuffer.java) are used
|
||||
heavily in the search engine's query processing.
|
||||
|
||||
They are dual-pointer buffers that offer tools for filtering data.
|
||||
@ -75,7 +75,7 @@ buffer.finalizeFiltering();
|
||||
|
||||
|
||||
Especially noteworthy are the operations `retain()` and `reject()` in
|
||||
[IntArraySearch](src/main/java/nu/marginalia/array/algo/IntArraySearch.java) and [LongArraySearch](src/main/java/nu/marginalia/array/algo/LongArraySearch.java).
|
||||
[IntArraySearch](java/nu/marginalia/array/algo/IntArraySearch.java) and [LongArraySearch](java/nu/marginalia/array/algo/LongArraySearch.java).
|
||||
They keep or remove all items in the buffer that exist in the referenced range of the array,
|
||||
which must be sorted.
|
||||
|
||||
|
@ -6,4 +6,4 @@ This is The Way when it comes to representing bit masks to humans.
|
||||
|
||||
## Central Classes
|
||||
|
||||
* [BrailleBlockPunchCards](src/main/java/nu/marginalia/bbpc/BrailleBlockPunchCards.java)
|
||||
* [BrailleBlockPunchCards](java/nu/marginalia/bbpc/BrailleBlockPunchCards.java)
|
@ -4,11 +4,11 @@ This package contains a small library for creating and reading a static b-tree i
|
||||
Both binary indices (i.e. sets) are supported, as well as arbitrary multiple-of-keysize key-value mappings where the data is
|
||||
interlaced with the keys in the leaf nodes. This is a fairly low-level datastructure.
|
||||
|
||||
The b-trees are specified through a [BTreeContext](src/main/java/nu/marginalia/btree/model/BTreeContext.java)
|
||||
The b-trees are specified through a [BTreeContext](java/nu/marginalia/btree/model/BTreeContext.java)
|
||||
which contains information about the data and index layout.
|
||||
|
||||
The b-trees are written through a [BTreeWriter](src/main/java/nu/marginalia/btree/BTreeWriter.java) and
|
||||
read with a [BTreeReader](src/main/java/nu/marginalia/btree/BTreeReader.java).
|
||||
The b-trees are written through a [BTreeWriter](java/nu/marginalia/btree/BTreeWriter.java) and
|
||||
read with a [BTreeReader](java/nu/marginalia/btree/BTreeReader.java).
|
||||
|
||||
## Demo
|
||||
|
||||
|
@ -5,7 +5,7 @@ for document deduplication. Hashes are compared using their hamming distance.
|
||||
|
||||
## Central Classes
|
||||
|
||||
* [EasyLSH](src/main/java/nu/marginalia/lsh/EasyLSH.java)
|
||||
* [EasyLSH](java/nu/marginalia/lsh/EasyLSH.java)
|
||||
|
||||
## Demo
|
||||
|
||||
|
@ -34,4 +34,4 @@ void ifTheThingDoTheThing(String str) {
|
||||
|
||||
## Central Classes
|
||||
|
||||
* [GuardedRegexFactory](src/main/java/nu/marginalia/gregex/GuardedRegexFactory.java)
|
||||
* [GuardedRegexFactory](java/nu/marginalia/gregex/GuardedRegexFactory.java)
|
@ -4,8 +4,8 @@ This library contains various tools used in language processing.
|
||||
|
||||
## Central Classes
|
||||
|
||||
* [SentenceExtractor](src/main/java/nu/marginalia/language/sentence/SentenceExtractor.java) -
|
||||
Creates a [DocumentLanguageData](src/main/java/nu/marginalia/language/model/DocumentLanguageData.java) from a text, containing
|
||||
* [SentenceExtractor](java/nu/marginalia/language/sentence/SentenceExtractor.java) -
|
||||
Creates a [DocumentLanguageData](java/nu/marginalia/language/model/DocumentLanguageData.java) from a text, containing
|
||||
its words, how they stem, POS tags, and so on.
|
||||
|
||||
## See Also
|
||||
|
@ -2,12 +2,12 @@ This micro-library with strategies for solving the problem of [write amplificati
|
||||
writing large files out of order to disk. It offers a simple API to write data to a file in a
|
||||
random order, while localizing the writes.
|
||||
|
||||
Several strategies are available from the [RandomFileAssembler](src/main/java/nu/marginalia/rwf/RandomFileAssembler.java)
|
||||
Several strategies are available from the [RandomFileAssembler](java/nu/marginalia/rwf/RandomFileAssembler.java)
|
||||
interface.
|
||||
|
||||
* Writing to a memory mapped file (non-solution, for small files)
|
||||
* Writing to a memory buffer (for systems with enough memory)
|
||||
* [RandomWriteFunnel](src/main/java/nu/marginalia/rwf/RandomWriteFunnel.java) - Not bound by memory.
|
||||
* [RandomWriteFunnel](java/nu/marginalia/rwf/RandomWriteFunnel.java) - Not bound by memory.
|
||||
|
||||
The data is written in a native byte order.
|
||||
|
||||
@ -41,5 +41,5 @@ catch (IOException ex) {
|
||||
|
||||
## Central Classes
|
||||
|
||||
* [RandomFileAssembler](src/main/java/nu/marginalia/rwf/RandomFileAssembler.java)
|
||||
* [RandomWriteFunnel](src/main/java/nu/marginalia/rwf/RandomWriteFunnel.java)
|
||||
* [RandomFileAssembler](java/nu/marginalia/rwf/RandomFileAssembler.java)
|
||||
* [RandomWriteFunnel](java/nu/marginalia/rwf/RandomWriteFunnel.java)
|
@ -5,7 +5,7 @@ the TF-IDF score of a keyword.
|
||||
|
||||
## Central Classes
|
||||
|
||||
* [TermFrequencyDict](src/main/java/nu/marginalia/term_frequency_dict/TermFrequencyDict.java)
|
||||
* [TermFrequencyDict](java/nu/marginalia/term_frequency_dict/TermFrequencyDict.java)
|
||||
|
||||
## See Also
|
||||
|
||||
|
@ -8,9 +8,9 @@ A crawl spec is a list of domains to be crawled. It is a parquet file with the
|
||||
|
||||
Crawl specs are used to define the scope of a crawl in the absence of known domains.
|
||||
|
||||
The [CrawlSpecRecord](src/main/java/nu/marginalia/model/crawlspec/CrawlSpecRecord.java) class is
|
||||
The [CrawlSpecRecord](java/nu/marginalia/model/crawlspec/CrawlSpecRecord.java) class is
|
||||
used to represent a record in the crawl spec.
|
||||
|
||||
The [CrawlSpecRecordParquetFileReader](src/main/java/nu/marginalia/io/crawlspec/CrawlSpecRecordParquetFileReader.java)
|
||||
and [CrawlSpecRecordParquetFileWriter](src/main/java/nu/marginalia/io/crawlspec/CrawlSpecRecordParquetFileWriter.java)
|
||||
The [CrawlSpecRecordParquetFileReader](java/nu/marginalia/io/crawlspec/CrawlSpecRecordParquetFileReader.java)
|
||||
and [CrawlSpecRecordParquetFileWriter](java/nu/marginalia/io/crawlspec/CrawlSpecRecordParquetFileWriter.java)
|
||||
classes are used to read and write the crawl spec parquet files.
|
||||
|
@ -15,27 +15,27 @@ removed in the future.
|
||||
|
||||
## Central Classes
|
||||
|
||||
* [CrawledDocument](src/main/java/nu/marginalia/crawling/model/CrawledDocument.java)
|
||||
* [CrawledDomain](src/main/java/nu/marginalia/crawling/model/CrawledDomain.java)
|
||||
* [CrawledDocument](java/nu/marginalia/crawling/model/CrawledDocument.java)
|
||||
* [CrawledDomain](java/nu/marginalia/crawling/model/CrawledDomain.java)
|
||||
|
||||
### Serialization
|
||||
|
||||
These serialization classes automatically negotiate the serialization format based on the
|
||||
file extension.
|
||||
|
||||
Data is accessed through a [SerializableCrawlDataStream](src/main/java/nu/marginalia/crawling/io/SerializableCrawlDataStream.java),
|
||||
Data is accessed through a [SerializableCrawlDataStream](java/nu/marginalia/crawling/io/SerializableCrawlDataStream.java),
|
||||
which is a somewhat enhanced Iterator that can be used to read data.
|
||||
|
||||
* [CrawledDomainReader](src/main/java/nu/marginalia/crawling/io/CrawledDomainReader.java)
|
||||
* [CrawledDomainWriter](src/main/java/nu/marginalia/crawling/io/CrawledDomainWriter.java)
|
||||
* [CrawledDomainReader](java/nu/marginalia/crawling/io/CrawledDomainReader.java)
|
||||
* [CrawledDomainWriter](java/nu/marginalia/crawling/io/CrawledDomainWriter.java)
|
||||
|
||||
### Parquet Serialization
|
||||
|
||||
The parquet serialization is done using the [CrawledDocumentParquetRecordFileReader](src/main/java/nu/marginalia/crawling/parquet/CrawledDocumentParquetRecordFileReader.java)
|
||||
and [CrawledDocumentParquetRecordFileWriter](src/main/java/nu/marginalia/crawling/parquet/CrawledDocumentParquetRecordFileWriter.java) classes,
|
||||
The parquet serialization is done using the [CrawledDocumentParquetRecordFileReader](java/nu/marginalia/crawling/parquet/CrawledDocumentParquetRecordFileReader.java)
|
||||
and [CrawledDocumentParquetRecordFileWriter](java/nu/marginalia/crawling/parquet/CrawledDocumentParquetRecordFileWriter.java) classes,
|
||||
which read and write parquet files respectively.
|
||||
|
||||
The model classes are serialized to parquet using the [CrawledDocumentParquetRecord](src/main/java/nu/marginalia/crawling/parquet/CrawledDocumentParquetRecord.java)
|
||||
The model classes are serialized to parquet using the [CrawledDocumentParquetRecord](java/nu/marginalia/crawling/parquet/CrawledDocumentParquetRecord.java)
|
||||
|
||||
The record has the following fields:
|
||||
|
||||
|
@ -4,11 +4,11 @@ reading and writing parquet files with the output from the
|
||||
|
||||
Main models:
|
||||
|
||||
* [DocumentRecord](src/main/java/nu/marginalia/model/processed/DocumentRecord.java)
|
||||
* * [DocumentRecordKeywordsProjection](src/main/java/nu/marginalia/model/processed/DocumentRecordKeywordsProjection.java)
|
||||
* * [DocumentRecordMetadataProjection](src/main/java/nu/marginalia/model/processed/DocumentRecordMetadataProjection.java)
|
||||
* [DomainLinkRecord](src/main/java/nu/marginalia/model/processed/DomainLinkRecord.java)
|
||||
* [DomainRecord](src/main/java/nu/marginalia/model/processed/DomainRecord.java)
|
||||
* [DocumentRecord](java/nu/marginalia/model/processed/DocumentRecord.java)
|
||||
* * [DocumentRecordKeywordsProjection](java/nu/marginalia/model/processed/DocumentRecordKeywordsProjection.java)
|
||||
* * [DocumentRecordMetadataProjection](java/nu/marginalia/model/processed/DocumentRecordMetadataProjection.java)
|
||||
* [DomainLinkRecord](java/nu/marginalia/model/processed/DomainLinkRecord.java)
|
||||
* [DomainRecord](java/nu/marginalia/model/processed/DomainRecord.java)
|
||||
|
||||
Since parquet is a column based format, some of the readable models are projections
|
||||
that only read parts of the input file.
|
||||
|
@ -38,16 +38,16 @@ https://www.marginalia.nu/log/93_atags/
|
||||
|
||||
## Central Classes
|
||||
|
||||
* [ConverterMain](src/main/java/nu/marginalia/converting/ConverterMain.java) orchestrates the conversion process.
|
||||
* [DocumentProcessor](src/main/java/nu/marginalia/converting/processor/DocumentProcessor.java) converts a single document.
|
||||
* - [HtmlDocumentProcessorPlugin](src/main/java/nu/marginalia/converting/processor/plugin/HtmlDocumentProcessorPlugin.java)
|
||||
* [ConverterMain](java/nu/marginalia/converting/ConverterMain.java) orchestrates the conversion process.
|
||||
* [DocumentProcessor](java/nu/marginalia/converting/processor/DocumentProcessor.java) converts a single document.
|
||||
* - [HtmlDocumentProcessorPlugin](java/nu/marginalia/converting/processor/plugin/HtmlDocumentProcessorPlugin.java)
|
||||
has HTML-specific logic related to a document, keywords and identifies features such as whether it has javascript.
|
||||
* * - [HtmlProcessorSpecializations](src/main/java/nu/marginalia/converting/processor/plugin/specialization/HtmlProcessorSpecializations.java)
|
||||
* * - [XenForoSpecialization](src/main/java/nu/marginalia/converting/processor/plugin/specialization/XenForoSpecialization.java) ...
|
||||
* - [PlainTextDocumentProcessorPlugin](src/main/java/nu/marginalia/converting/processor/plugin/PlainTextDocumentProcessorPlugin.java)
|
||||
* * - [HtmlProcessorSpecializations](java/nu/marginalia/converting/processor/plugin/specialization/HtmlProcessorSpecializations.java)
|
||||
* * - [XenForoSpecialization](java/nu/marginalia/converting/processor/plugin/specialization/XenForoSpecialization.java) ...
|
||||
* - [PlainTextDocumentProcessorPlugin](java/nu/marginalia/converting/processor/plugin/PlainTextDocumentProcessorPlugin.java)
|
||||
has plain text-specific logic related to a document...
|
||||
|
||||
* [DomainProcessor](src/main/java/nu/marginalia/converting/processor/DomainProcessor.java) converts each document and
|
||||
* [DomainProcessor](java/nu/marginalia/converting/processor/DomainProcessor.java) converts each document and
|
||||
generates domain-wide metadata such as link graphs.
|
||||
|
||||
## See Also
|
||||
|
@ -31,10 +31,10 @@ On top of organic links, the crawler can use sitemaps and rss-feeds to discover
|
||||
|
||||
## Central Classes
|
||||
|
||||
* [CrawlerMain](src/main/java/nu/marginalia/crawl/CrawlerMain.java) orchestrates the crawling.
|
||||
* [CrawlerRetreiver](src/main/java/nu/marginalia/crawl/retreival/CrawlerRetreiver.java)
|
||||
* [CrawlerMain](java/nu/marginalia/crawl/CrawlerMain.java) orchestrates the crawling.
|
||||
* [CrawlerRetreiver](java/nu/marginalia/crawl/retreival/CrawlerRetreiver.java)
|
||||
visits known addresses from a domain and downloads each document.
|
||||
* [HttpFetcher](src/main/java/nu/marginalia/crawl/retreival/fetcher/HttpFetcherImpl.java)
|
||||
* [HttpFetcher](java/nu/marginalia/crawl/retreival/fetcher/HttpFetcherImpl.java)
|
||||
fetches URLs.
|
||||
|
||||
## See Also
|
||||
|
@ -16,5 +16,5 @@ This is a very light-weight module that delegates the actual work to the modules
|
||||
Their respective readme files contain more information about the indexes themselves
|
||||
and how they are constructed.
|
||||
|
||||
The process is glued together within [IndexConstructorMain](src/main/java/nu/marginalia/index/IndexConstructorMain.java),
|
||||
The process is glued together within [IndexConstructorMain](java/nu/marginalia/index/IndexConstructorMain.java),
|
||||
which is the only class of interest in this module.
|
||||
|
@ -6,4 +6,4 @@ the index-service.
|
||||
|
||||
## Central Classes
|
||||
|
||||
* [LoaderMain](src/main/java/nu/marginalia/loading/LoaderMain.java) main class.
|
||||
* [LoaderMain](java/nu/marginalia/loading/LoaderMain.java) main class.
|
@ -4,4 +4,4 @@ The API service acts as a gateway for public API requests, it deals with API key
|
||||
|
||||
## Central Classes
|
||||
|
||||
* [ApiService](src/main/java/nu/marginalia/api/ApiService.java) handles REST requests and delegates to the appropriate handling classes.
|
||||
* [ApiService](java/nu/marginalia/api/ApiService.java) handles REST requests and delegates to the appropriate handling classes.
|
@ -14,13 +14,13 @@ to the user.
|
||||
|
||||
## Central classes
|
||||
|
||||
* [SearchService](src/main/java/nu/marginalia/search/SearchService.java) receives requests and delegates to the
|
||||
* [SearchService](java/nu/marginalia/search/SearchService.java) receives requests and delegates to the
|
||||
appropriate services.
|
||||
|
||||
* [CommandEvaluator](src/main/java/nu/marginalia/search/command/CommandEvaluator.java) interprets a user query and acts
|
||||
* [CommandEvaluator](java/nu/marginalia/search/command/CommandEvaluator.java) interprets a user query and acts
|
||||
upon it, dealing with special operations like `browse:` or `site:`.
|
||||
|
||||
* [SearchQueryIndexService](src/main/java/nu/marginalia/search/svc/SearchQueryIndexService.java) passes a parsed search query to the index service, and
|
||||
* [SearchQueryIndexService](java/nu/marginalia/search/svc/SearchQueryIndexService.java) passes a parsed search query to the index service, and
|
||||
then decorates the search results so that they can be rendered.
|
||||
|
||||
## See Also
|
||||
|
@ -4,4 +4,4 @@ The assistant service helps the search service by offering various peripheral fu
|
||||
|
||||
## Central Classes
|
||||
|
||||
* [AssistantService](src/main/java/nu/marginalia/assistant/AssistantService.java) handles REST requests and delegates to the appropriate handling classes.
|
||||
* [AssistantService](java/nu/marginalia/assistant/AssistantService.java) handles REST requests and delegates to the appropriate handling classes.
|
@ -15,7 +15,7 @@ Conceptually the application is broken into three parts:
|
||||
|
||||
## Central Classes
|
||||
|
||||
* [ControlService](src/main/java/nu/marginalia/control/ControlService.java)
|
||||
* [ControlService](java/nu/marginalia/control/ControlService.java)
|
||||
|
||||
## See Also
|
||||
|
||||
|
@ -9,7 +9,7 @@ much of the executor's functionality.
|
||||
|
||||
## Central Classes
|
||||
|
||||
* [ExecutorActorControlService](src/main/java/nu/marginalia/actor/ExecutorActorControlService.java)
|
||||
* [ExecutorActorControlService](java/nu/marginalia/actor/ExecutorActorControlService.java)
|
||||
|
||||
## See Also
|
||||
|
||||
|
@ -15,7 +15,7 @@ The web interface also offers a JSON API for machine-based queries.
|
||||
|
||||
## Central Classes
|
||||
|
||||
This module is almost entirely boilerplate, except the [QueryBasicInterface](src/main/java/nu/marginalia/query/QueryBasicInterface.java)
|
||||
This module is almost entirely boilerplate, except the [QueryBasicInterface](java/nu/marginalia/query/QueryBasicInterface.java)
|
||||
class, which offers a REST API for querying the index.
|
||||
|
||||
Much of the guts of the query service are in the [query-service](../../functions/search-query)
|
||||
|
Loading…
Reference in New Issue
Block a user