Commit Graph

1160 Commits

Author SHA1 Message Date
Viktor Lofgren
49344d7ea8 (control) Administrative QOL improvement, GUI for banning spam 2023-10-07 15:43:18 +02:00
Viktor Lofgren
1b418d77ff (search) We got some new IP ranges to work with for the crawler 2023-10-07 13:41:55 +02:00
Viktor Lofgren
80cc302627 (search) We can't in claim to be on PC hardware anymore... 2023-10-07 11:49:29 +02:00
Viktor
8e1abc3f10
(index-reverse) Parallel construction of the reverse indexes. (#52)
* (index-reverse) Parallel construction of the reverse indexes.

* (array) Remove wasteful calculation of numDistinct before merging two sorted arrays.

* (index-reverse)  Force changes to disk on close, reduce logging.

* (index-reverse)  Clean up merging process and add back logging

* (run)  Add a conservative default for INDEX_CONSTRUCTION_PROCESS_OPTS's parallelism as it eats a lot of RAM

* (index-reverse)  Better logging during processing

* (array) 2GB+ compatible write() function

* (array) 2GB+ compatible write() function

* (index-reverse) We are logging like Bolsonaro and I will not have it.

* (reverse-index) Self-diagnostics

* (btree) Fix bug in btree reader to do with large data sizes
2023-10-07 10:00:00 +02:00
Viktor Lofgren
e498c6907a (forward-index) Don't leak off heap memory 2023-10-05 21:22:13 +02:00
Viktor Lofgren
08e8fc6736 (index-journal) Thread safe IndexJournalReadEntry 2023-10-05 19:39:09 +02:00
Viktor Lofgren
f6e9ef6de9 (array) Fix transferFrom() so it survives larger than 2 GB transfers 2023-10-04 13:57:36 +02:00
Viktor Lofgren
c51159672e (build) Move unit test configuration to root build.gradle 2023-10-04 12:46:22 +02:00
Viktor Lofgren
233b51e29e (test) flag DomainTypesTest as Slow to exclude from regular CI 2023-10-04 12:23:10 +02:00
Viktor Lofgren
54c8e13a68 (term-frequency-dict) Fix memory leak in TermFrequencyDict 2023-10-04 11:55:11 +02:00
Viktor Lofgren
405300b4b2 (control) Fix bug where finishing one process ad hoc task would remove all other tasks from the db 2023-10-04 11:44:31 +02:00
Viktor Lofgren
a6abd31ead (setup) Upgrade mariadb image to one that exists in real life 2023-10-03 11:07:58 +02:00
Viktor Lofgren
4c26674ff4 (setup) Use mirrored lid.176.ftz file that is of a compatible version 2023-10-03 10:29:44 +02:00
Viktor Lofgren
40768e935b (test) Removing /tmp-guardrails as it doesn't hold in CI 2023-10-02 16:52:59 +02:00
Viktor Lofgren
23be648456 (setup) use curl instead of wget for setup.sh 2023-10-02 16:38:23 +02:00
Viktor Lofgren
13ee31770a (file storage) Make it possible to override the value returned by getFileStorage(type) with a JVM property. 2023-10-01 12:57:53 +02:00
Viktor Lofgren
93dc80000c (bugfix) Fix NPE in KeywordExtractor due to bad SoftReference handling 2023-09-26 17:16:41 +02:00
Viktor Lofgren
e0cd3cd991 (converter) Alter StackexchangeSideloader's summary length to align with the rest of the system. 2023-09-26 12:19:43 +02:00
Viktor Lofgren
81ae501e73 (converter) Use ThreadLocalSentenceExtractorProvider for PlainText plugin as well 2023-09-25 18:28:34 +02:00
Viktor Lofgren
9b781f8404 (keyoword-extractor) Address very rare race condition in memoization logic 2023-09-25 18:28:04 +02:00
Viktor Lofgren
f797a92f87 (converter, minor) Use domain name in task heartbeat progress 2023-09-25 18:27:04 +02:00
Viktor Lofgren
0a579814a2 (docs) Parquet How-to 2023-09-24 19:40:45 +02:00
Viktor Lofgren
ec6c9bca62 (common) Fix factual error in comments 2023-09-24 19:40:19 +02:00
Viktor Lofgren
a433bbbe45 (converter) Fix rare sentence extractor bug
It was caused by non-thread safe concurrent memory access in SentenceExtractor.
2023-09-24 19:39:48 +02:00
Viktor Lofgren
8ca20f184d (keyword-extraction) Chasing my tail looking for a bug 2023-09-24 19:39:48 +02:00
Viktor Lofgren
d160954080 (index) Two useful debug endpoints 2023-09-24 19:39:48 +02:00
Viktor Lofgren
14372e0ef0 (index) Slightly reduce alloc churn 2023-09-24 19:36:14 +02:00
Viktor Lofgren
03bffa27ac (search) Add combined id to the search result HTML 2023-09-24 19:34:35 +02:00
Viktor Lofgren
028b5a4f0d (minor performance) Reduce GC churn in index 2023-09-24 12:12:08 +02:00
Viktor Lofgren
cd12f49fc0 (long-array) Return slices SegmentLongArray of itself for range() &c 2023-09-24 11:31:54 +02:00
Viktor Lofgren
a144749a8d (jdk21) Add --enable-preview to Java tasks 2023-09-24 11:31:17 +02:00
Viktor Lofgren
1bd146fb8e (minor) Remove dead code 2023-09-24 10:55:20 +02:00
Viktor Lofgren
5f6c3da7a4 (index) Add close methods on the index readers so they clean up their mmaps 2023-09-24 10:54:23 +02:00
Viktor Lofgren
d0aa754252 (long-array) Implement java.lang.foreign.Arena based lifecycle control for LongArray.
Further de-ByteBuffer:ing of these classes is to be done, but this is the smallest most urgently needed benefit.

This commit is a WIP but in a fully working state, pushing due to the importance of the changes to offer lifecycle control over mmaps.
2023-09-24 10:40:06 +02:00
Viktor Lofgren
dbe9235f3a (*) Upgrade to JDK21 with preview enabled.
... also move some common configuration into the root build.gradle-file.

Support for JDK21 in lombok is a bit sketchy at the moment, but it seems to work.  This upgrade is kind of important as the new index construction really benefits from Arena based lifecycle control over off-heap memory.
2023-09-24 10:38:59 +02:00
Viktor Lofgren
d78569986b (backups) Fix bug where backup service would zero the linkdb when restoring. 2023-09-22 18:34:34 +02:00
Viktor Lofgren
95323e6caa (backups) Support restore multi-source load data 2023-09-22 18:34:17 +02:00
Viktor Lofgren
f809d22fc6 (loader) Support simultaneous loading of multiple processed data sets 2023-09-22 13:14:58 +02:00
Viktor
763d61db8d
Create Additional Contributors.md 2023-09-21 15:38:19 +02:00
Viktor Lofgren
10cad3abb2 (dating) Implementing @samstorment's fantastic design polish 2023-09-21 15:19:50 +02:00
Viktor Lofgren
9338f35cd8 (doc) Remove confusingly outdated ER-diagrams 2023-09-21 15:08:27 +02:00
Viktor Lofgren
ead6fa9daa (doc) Update conceptual-overview.svg to reflect the removal of the lexicon 2023-09-21 13:47:05 +02:00
Viktor Lofgren
ad660cf420 (converter) Bugfix: Don't try to Path.of() on optional field 2023-09-21 13:27:09 +02:00
Viktor Lofgren
75f8ae2815 (file-storage) Use human-readable timestamps in the names of file storage directories 2023-09-21 13:22:53 +02:00
Viktor Lofgren
70aa04c047 (converter, stackexchange-xml) Add the ability to sideload stackexchange data 2023-09-21 12:48:33 +02:00
Viktor Lofgren
4aa47e87f2 (blocking-thread-pool) Add isTerminated convenience function 2023-09-21 12:47:41 +02:00
Viktor Lofgren
f8050816ac (search) Don't run LSH deduplication on details with zero lsh to support not calculating this hash. 2023-09-21 12:47:02 +02:00
Viktor Lofgren
5b0a6d7ec1 (stackexchange-converter) Create tool for converting stackexchange 7z-files to digestible sqlite db:s 2023-09-20 15:15:13 +02:00
Viktor Lofgren
3b4d08f52b (stackexchange-integration) Add better comments 2023-09-20 14:43:06 +02:00
Viktor Lofgren
6bbf40d7d2 (stackexchange-integration) Tools for reading stackexchange xml files 2023-09-20 14:17:33 +02:00