dbe9235f3a
... also move some common configuration into the root build.gradle-file. Support for JDK21 in lombok is a bit sketchy at the moment, but it seems to work. This upgrade is kind of important as the new index construction really benefits from Arena based lifecycle control over off-heap memory. |
||
---|---|---|
.. | ||
src/main/java/nu/marginalia/tools | ||
build.gradle | ||
readme.md |
This tool converts from stackexchange's 7z-compressed XML format to a sqlite database that is digestible by the search engine.
See features-convert/stackexchange-xml for an explanation why this is necessary.
Stackexchange's data dumps can be downloaded from archive.org here: https://archive.org/details/stackexchange
Usage
$ stackexchange-converter domain-name input.7z output.db
Stackexchange is relatively conservative about allowing new questions, so this is a job that doesn't run more than once.
Note: Reading and writing these db files is absurdly slow on a mechanical hard-drive.