Viktor Lofgren
46d761f34f
(language) fasttext based language filter
2023-08-16 15:48:12 +02:00
Viktor Lofgren
e7192a9cad
(mq) Refactor mq and actor library and move it to libraries out of common
2023-08-15 10:53:23 +02:00
Viktor Lofgren
4ab1cd9502
(*) last touches
2023-08-07 12:57:44 +02:00
Viktor Lofgren
6f4e767a04
(minor) Re-enable monkey-patch-json for converter
2023-07-31 10:31:46 +02:00
Viktor Lofgren
d3f01bd171
(crawler, converter) Remove monkey patched gson from dependencies
2023-07-29 19:18:12 +02:00
Viktor Lofgren
f11103d31d
(WIP) Make it possible to sideload encyclopedia data.
...
This is mostly a pilot track for sideloading other large websites.
Also change coverter to produce a more compact output (java serialization instead of json).
2023-07-28 18:14:43 +02:00
Viktor Lofgren
bca4bbb6c8
(*) Refactor MQ and MQSM
2023-07-17 13:57:32 +02:00
Viktor Lofgren
8b74e3aa0d
(*) File Storage WIP
2023-07-14 17:08:10 +02:00
Viktor
cbbf60a599
Better fingerprinting ( #35 )
...
* Better fingerprinting for server tech
* Many more features in FeatureExtractor
* Blog specialization
* SiteType table
2023-07-10 18:58:43 +02:00
Viktor Lofgren
d71124961e
Better tests for crawling and processing.
2023-06-27 16:11:27 +02:00
Viktor Lofgren
f8f9f04158
Specialized logic for processing Lemmy-based websites.
2023-06-27 10:57:54 +02:00
Viktor Lofgren
266ad2e4de
Re-introduce monkey patched GSON to make converter run better.
...
fixup! Re-introduce monkey patched GSON to make converter run better.
fixup! Re-introduce monkey patched GSON to make converter run better.
2023-06-19 17:58:19 +02:00
Viktor Lofgren
449471a076
Yet more restructuring. Improved search result ranking.
2023-03-16 21:35:54 +01:00
Viktor Lofgren
d82532b7f1
More restructuring, big bug fixes in keyword extraction.
2023-03-13 17:39:53 +01:00