CatgirlIntelligenceAgency/code/tools/stackexchange-converter
Viktor Lofgren dbe9235f3a (*) Upgrade to JDK21 with preview enabled.
... also move some common configuration into the root build.gradle-file.

Support for JDK21 in lombok is a bit sketchy at the moment, but it seems to work.  This upgrade is kind of important as the new index construction really benefits from Arena based lifecycle control over off-heap memory.
2023-09-24 10:38:59 +02:00
..
src/main/java/nu/marginalia/tools (stackexchange-converter) Create tool for converting stackexchange 7z-files to digestible sqlite db:s 2023-09-20 15:15:13 +02:00
build.gradle (*) Upgrade to JDK21 with preview enabled. 2023-09-24 10:38:59 +02:00
readme.md (stackexchange-converter) Create tool for converting stackexchange 7z-files to digestible sqlite db:s 2023-09-20 15:15:13 +02:00

This tool converts from stackexchange's 7z-compressed XML format to a sqlite database that is digestible by the search engine.

See features-convert/stackexchange-xml for an explanation why this is necessary.

Stackexchange's data dumps can be downloaded from archive.org here: https://archive.org/details/stackexchange

Usage

$ stackexchange-converter domain-name input.7z output.db

Stackexchange is relatively conservative about allowing new questions, so this is a job that doesn't run more than once.

Note: Reading and writing these db files is absurdly slow on a mechanical hard-drive.

See Also