Commit Graph

95 Commits

Author SHA1 Message Date
Viktor
5bd3934d22
Merge pull request #64 from dreimolo/macos_AS_fix
Macos apple silicon fix, and slight improvements to sample downloader
2023-12-18 18:29:14 +01:00
Viktor Lofgren
128f550ee5 (run) Download to a temporary file to avoid corruption from aborted downloads 2023-12-18 18:28:17 +01:00
Viktor Lofgren
c92f1b8df8 (geo-ip) Revert removal of ip2location logic
We do both ip2location and ASN data.

The change also adds some keywords based on autonomous system information, on a somewhat experimental basis.  It would be neat to be able to e.g. exclude cloud services or just e.g. cloudflare from the search results.
2023-12-17 15:03:00 +01:00
Viktor Lofgren
d7bd540683 (*) Replace the ip2location IP geolocation data with ASN information from apnic.net.
Doesn't really make sense to use ip2location as a middle man for information that is already freely available...
2023-12-16 21:55:04 +01:00
dreimolo
62954f98de adds xl to help output 2023-12-16 19:41:41 +01:00
dreimolo
6c7d7427bf Adds check for wget and curl, and valid sample archives 2023-12-16 14:14:58 +01:00
Viktor Lofgren
2f4500be5a (search) New frontend look 2023-12-02 17:06:40 +01:00
Viktor Lofgren
21d6aa421c (docs) Update setup instructions 2023-11-30 21:44:29 +01:00
Viktor Lofgren
8a5b853fae Fix experiment runner 2023-11-15 14:03:17 +01:00
Viktor Lofgren
ef16502159 (doc) Update readme 2023-11-07 16:00:18 +01:00
Viktor Lofgren
3047e2dd7c (screenshot-capture-tool) Make screenshot-capture-tool cooperate with docker 2023-11-01 16:38:55 +01:00
Viktor Lofgren
5d6e0e3790 (log) Clean up logging
Don't log the PROCESS stream to executor's logs, as it will also be logged in the spawned process' log files.

Also tell the spawned process which "service" it is so that it gets a log file with a name that makes sense.
2023-10-29 15:52:17 +01:00
Viktor Lofgren
f6fcb04817 (experiment) Repair the experiment runner 2023-10-27 16:16:50 +02:00
Viktor Lofgren
b8796d825d (docs) Update documentation 2023-10-27 13:24:49 +02:00
Viktor Lofgren
0f637fb722 (logging) Better logging configurations 2023-10-26 12:48:10 +02:00
Viktor Lofgren
c130d7cf5f (*) Use trafeik instead of nginx for reverse proxy 2023-10-24 14:44:19 +02:00
Viktor Lofgren
81dd3809e9 (*) WIP Add node affinity to EC_DOMAIN
Very messy commit due to fractalline yak shaving
2023-10-19 17:48:34 +02:00
Viktor Lofgren
93122bdd18 (run) Add two nodes to the demo setup 2023-10-16 17:37:26 +02:00
Viktor Lofgren
16e0738731 (*) Get multi-node routing working. 2023-10-15 18:38:30 +02:00
Viktor Lofgren
4baf9527d7 (*) WIP Control GUI redesign, executor-service, multi-node mq
This turned out to be very difficult to do in small isolated steps.

* Design overhaul of the control gui using bootstrap
* Move the actors out of control-service into to a new executor-service, that can be run on multiple nodes
* Add node-affinity to message queue
2023-10-14 12:08:43 +02:00
Viktor Lofgren
6319b8ef51 (api-service) Improved testability, always set content type to application/json 2023-10-09 15:39:34 +02:00
Viktor
8e1abc3f10
(index-reverse) Parallel construction of the reverse indexes. (#52)
* (index-reverse) Parallel construction of the reverse indexes.

* (array) Remove wasteful calculation of numDistinct before merging two sorted arrays.

* (index-reverse)  Force changes to disk on close, reduce logging.

* (index-reverse)  Clean up merging process and add back logging

* (run)  Add a conservative default for INDEX_CONSTRUCTION_PROCESS_OPTS's parallelism as it eats a lot of RAM

* (index-reverse)  Better logging during processing

* (array) 2GB+ compatible write() function

* (array) 2GB+ compatible write() function

* (index-reverse) We are logging like Bolsonaro and I will not have it.

* (reverse-index) Self-diagnostics

* (btree) Fix bug in btree reader to do with large data sizes
2023-10-07 10:00:00 +02:00
Viktor Lofgren
4c26674ff4 (setup) Use mirrored lid.176.ftz file that is of a compatible version 2023-10-03 10:29:44 +02:00
Viktor Lofgren
23be648456 (setup) use curl instead of wget for setup.sh 2023-10-02 16:38:23 +02:00
Viktor Lofgren
d160954080 (index) Two useful debug endpoints 2023-09-24 19:39:48 +02:00
Viktor Lofgren
dbe9235f3a (*) Upgrade to JDK21 with preview enabled.
... also move some common configuration into the root build.gradle-file.

Support for JDK21 in lombok is a bit sketchy at the moment, but it seems to work.  This upgrade is kind of important as the new index construction really benefits from Arena based lifecycle control over off-heap memory.
2023-09-24 10:38:59 +02:00
Viktor Lofgren
c68d17d482 (keyword-extraction) Fix bug leading to position data missing on some keywords.
This was due to a discrepancy between the KeywordPositionBitmask and WordsTfIdfCounts' concept of a keyword.
2023-09-02 14:48:55 +02:00
Viktor Lofgren
2b00cd632d (process) Propagate environment JVM params to the index constructor 2023-09-01 15:39:42 +02:00
Viktor
bdcbfb11a8
Merge pull request #42 from MarginaliaSearch/no-downtime-upgrades
Zero downtime upgrades, merge-based index construction
2023-08-29 17:05:48 +02:00
Viktor Lofgren
fa87c7e1b7 (process) Automatic flightrecorder runs for processes when run in docker. 2023-08-29 14:12:51 +02:00
Viktor Lofgren
194a6057dd (index,control) Recoverable index backups 2023-08-25 14:57:43 +02:00
Viktor
229c63c46d
Update readme.md 2023-08-24 13:27:24 +02:00
Viktor Lofgren
b958acb76a (file-storage) New File Storage type for linkdb 2023-08-24 09:06:13 +02:00
Viktor Lofgren
e8c0648e04 Fix missing vol/ss dir in setup.sh 2023-08-23 17:59:40 +02:00
Viktor Lofgren
8bd9a00c38 Amend setup instructions with command 2023-08-23 14:02:21 +00:00
Viktor Lofgren
972d03efdf Fix error in run/readme where it suggested local dev environment uses HTTPS 2023-08-23 13:47:39 +00:00
Viktor Lofgren
2656fcfe2c (conf) Remove unnecessary JVM flags for processes 2023-08-17 17:42:47 +02:00
Viktor Lofgren
46d761f34f (language) fasttext based language filter 2023-08-16 15:48:12 +02:00
Viktor Lofgren
c56ee10185 (control) Separate [Process] and [Process and Load] actions for crawl data; all SLOW data is deletable. 2023-08-13 13:39:59 +02:00
Viktor
69b28fd07d
Update readme.md 2023-08-12 18:58:21 +02:00
Viktor
99884c2c7e
Update readme.md 2023-08-12 15:39:28 +02:00
Viktor Lofgren
a42f707b2d (docs) Update readme with up to date instructions 2023-08-11 13:43:00 +02:00
Viktor Lofgren
eef37927ba (docs) Update readme with up to date instructions 2023-08-11 13:42:14 +02:00
Viktor Lofgren
cdfe284f9a (file storage) File Storage Type for EXPORT data
(file storage) File Storage Type for EXPORT data
2023-08-05 14:45:03 +02:00
Viktor Lofgren
ba724bc1b2 (scripts|docs) Update scripts and documentations for the new operator's gui and file storage workflows. 2023-08-01 22:47:37 +02:00
Viktor Lofgren
483c2dbb44 (conf) Change default user-agent to not associate it with the project; remove unused disks.properties file. 2023-08-01 17:34:25 +02:00
Viktor Lofgren
58556af6c7 (db) Use flwyay for database migrations. 2023-08-01 17:08:42 +02:00
Viktor Lofgren
9786f82220 Fix environment variables to processes so jmc works 2023-07-31 10:32:23 +02:00
Viktor Lofgren
6ff7e9648f (crawler) Use and pass the proper environment variables to the processes. 2023-07-30 16:54:02 +02:00
Viktor Lofgren
a56953c798 (converter, WIP) Refactor converter to not have to load everything into RAM. 2023-07-24 15:25:09 +02:00