2023-03-07 17:32:16 +01:00
|
|
|
# Code
|
|
|
|
|
|
|
|
This is a pretty large and diverse project with many moving parts.
|
2023-03-21 17:26:59 +01:00
|
|
|
|
2023-03-07 17:32:16 +01:00
|
|
|
You'll find a short description in each module of what it does and how it relates to other modules.
|
2023-03-21 17:26:59 +01:00
|
|
|
The modules each have names like "library" or "process" or "feature". These have specific meanings.
|
|
|
|
See [doc/module-taxonomy.md](../doc/module-taxonomy.md).
|
2023-03-07 17:32:16 +01:00
|
|
|
|
|
|
|
## Overview
|
|
|
|
|
2023-03-21 17:11:28 +01:00
|
|
|
A map of the most important components and how they relate can be found below.
|
|
|
|
|
2023-03-25 16:32:10 +01:00
|
|
|
![image](../doc/diagram/conceptual-overview.svg)
|
2023-03-21 17:11:28 +01:00
|
|
|
|
2023-11-30 21:38:57 +01:00
|
|
|
The core part of the search engine is the index service, which is responsible for storing and retrieving
|
|
|
|
the document data. The index serive is partitioned, along with the executor service, which is responsible for executing
|
|
|
|
processes. At least one instance of each service must be run, but more can be run
|
|
|
|
alongside. Multiple partitions is desirable in production to distribute load across multiple physical drives,
|
|
|
|
as well as reducing the impact of downtime.
|
|
|
|
|
|
|
|
Search queries are delegated via the query service, which is a proxy that fans out the query to all
|
|
|
|
eligible index services. The control service is responsible for distributing commands to the executor
|
|
|
|
service, and for monitoring the health of the system. It also offers a web interface for operating the system.
|
|
|
|
|
2023-03-07 17:32:16 +01:00
|
|
|
### Services
|
2023-11-30 21:38:57 +01:00
|
|
|
* [core services](services-core/) Most of these services are stateful, memory hungry, and doing heavy lifting.
|
2023-10-09 14:56:59 +02:00
|
|
|
* * [control](services-core/control-service)
|
2023-10-09 13:40:01 +02:00
|
|
|
* * [query](services-core/query-service)
|
2023-03-07 17:32:16 +01:00
|
|
|
* * [index](services-core/index-service)
|
2023-10-27 12:45:39 +02:00
|
|
|
* * [executor](services-core/executor-service)
|
2023-03-07 17:32:16 +01:00
|
|
|
* * [assistant](services-core/assistant-service)
|
2023-11-30 21:38:57 +01:00
|
|
|
* [application services](services-application/) Mostly stateless gateways providing access to the core services.
|
2023-10-09 13:45:45 +02:00
|
|
|
* * [api](services-application/api-service) - public API
|
2023-10-09 15:12:30 +02:00
|
|
|
* * [search](services-application/search-service) - marginalia search application
|
2023-10-09 13:45:45 +02:00
|
|
|
* * [dating](services-application/dating-service) - [https://explore.marginalia.nu/](https://explore.marginalia.nu/)
|
|
|
|
* * [explorer](services-application/explorer-service) - [https://explore2.marginalia.nu/](https://explore2.marginalia.nu/)
|
2023-03-07 17:32:16 +01:00
|
|
|
* an [internal API](api/)
|
|
|
|
|
2023-03-12 11:42:07 +01:00
|
|
|
### Processes
|
|
|
|
|
2023-11-30 21:38:57 +01:00
|
|
|
Processes are batch jobs that deal with data retrieval, processing and loading. These are spawned and orchestrated by
|
|
|
|
the executor service, which is controlled by the control service.
|
2023-03-12 11:42:07 +01:00
|
|
|
|
2023-03-13 17:39:53 +01:00
|
|
|
* [processes](processes/)
|
|
|
|
* * [crawling-process](processes/crawling-process)
|
|
|
|
* * [converting-process](processes/converting-process)
|
|
|
|
* * [loading-process](processes/loading-process)
|
2023-03-12 11:42:07 +01:00
|
|
|
|
2023-03-17 16:03:11 +01:00
|
|
|
#### Tools
|
|
|
|
|
|
|
|
* * [term-frequency-extractor](tools/term-frequency-extractor)
|
|
|
|
|
2023-03-12 10:50:31 +01:00
|
|
|
### Features
|
|
|
|
|
|
|
|
Features are relatively stand-alone components that serve some part of the domain. They aren't domain-independent,
|
|
|
|
but isolated.
|
|
|
|
|
|
|
|
* [features-search](features-search)
|
|
|
|
* [features-crawl](features-crawl)
|
2023-03-13 17:39:53 +01:00
|
|
|
* [features-convert](features-convert)
|
2023-03-12 10:50:31 +01:00
|
|
|
* [features-index](features-index)
|
|
|
|
|
2023-03-07 17:32:16 +01:00
|
|
|
### Libraries and primitives
|
2023-03-12 10:50:31 +01:00
|
|
|
|
|
|
|
Libraries are stand-alone code that is independent of the domain logic.
|
|
|
|
|
2023-03-07 17:32:16 +01:00
|
|
|
* [common](common/) elements for creating a service, a client etc.
|
|
|
|
* [libraries](libraries/) containing non-search specific code.
|
|
|
|
* * [array](libraries/array/) - large memory mapped area library
|
|
|
|
* * [btree](libraries/btree/) - static btree library
|