Viktor Lofgren
787a20cbaa
(crawling-model) Implement a parquet format for crawl data
...
This is not hooked into anything yet. The change also makes modifications to the parquet-floor library to support reading and writing of byte[] arrays. This is desirable since we may in the future want to support inputs that are not text-based, and codifying the assumption that each document is a string will definitely cause us grief down the line.
2023-12-13 16:22:19 +01:00
Viktor Lofgren
dbe9235f3a
(*) Upgrade to JDK21 with preview enabled.
...
... also move some common configuration into the root build.gradle-file.
Support for JDK21 in lombok is a bit sketchy at the moment, but it seems to work. This upgrade is kind of important as the new index construction really benefits from Arena based lifecycle control over off-heap memory.
2023-09-24 10:38:59 +02:00
Viktor Lofgren
35996d0adb
(docs) Update the documentation up-to-date information
2023-09-14 11:33:36 +02:00
Viktor Lofgren
9f672a0cf4
(parquet-floor) Modify the parquet library to permit list-fields.
2023-09-13 15:56:35 +02:00
Viktor Lofgren
a00cabe223
(parquet-floor) Patch in support for writing and reading repeated values
2023-09-11 14:06:43 +02:00
Viktor Lofgren
dbe974f510
(parquet) Use ZSTD compression by default.
2023-09-11 09:02:58 +02:00
Viktor Lofgren
a284682deb
(parquet) Add parquet library
...
This small library, while great, will require some modifications to fit the project's needs, so it goes into third-party directly.
2023-09-05 10:38:51 +02:00
Viktor Lofgren
aa0d256d6a
Upgrade code to Java 20.
...
* Change language version
* Upgrade Lombok to a JDK20 compatible version
2023-08-23 13:37:49 +00:00
Viktor Lofgren
86a5cc5c5f
(hash) Modified version of common codec's Murmur3 hash
2023-08-01 14:57:40 +02:00
Viktor Lofgren
186a02acfd
Optimize RDRPosTagger to use integer comparisons instead of string comparisons.
...
Also reduce the cache-thrashing by deconstructing the tree's nodes into arrays.
2023-06-19 17:58:19 +02:00
Viktor Lofgren
6f2a7977c1
(Minor) Remove character debris in build.gradle
2023-06-19 17:58:19 +02:00
Viktor Lofgren
266ad2e4de
Re-introduce monkey patched GSON to make converter run better.
...
fixup! Re-introduce monkey patched GSON to make converter run better.
fixup! Re-introduce monkey patched GSON to make converter run better.
2023-06-19 17:58:19 +02:00
Viktor Lofgren
449471a076
Yet more restructuring. Improved search result ranking.
2023-03-16 21:35:54 +01:00
Viktor Lofgren
616effdb3c
The refactoring will continue until morale improves.
2023-03-12 10:04:48 +01:00
Viktor Lofgren
b945fd7f39
A lot of readmes, some refactoring.
2023-03-06 18:32:13 +01:00
Viktor Lofgren
4fdaaa16ba
Restructuring the git repo
2023-03-04 13:19:01 +01:00