Viktor Lofgren
|
d71124961e
|
Better tests for crawling and processing.
|
2023-06-27 16:11:27 +02:00 |
|
Viktor Lofgren
|
f8f9f04158
|
Specialized logic for processing Lemmy-based websites.
|
2023-06-27 10:57:54 +02:00 |
|
Viktor Lofgren
|
bd2c3855ed
|
Add bits and keywords for generator classes (docs, forum, wiki).
|
2023-06-23 21:35:28 +02:00 |
|
Viktor Lofgren
|
b5ef67ed28
|
Categorize generators by type
This is a great quality signal!
Add the type as document bitflags by category.
|
2023-06-22 16:04:37 +02:00 |
|
Viktor Lofgren
|
f140e7d7c7
|
Use a default tag for unset or invalid generators.
|
2023-06-21 17:30:14 +02:00 |
|
Viktor Lofgren
|
a9a2960e86
|
New synthetic keyword for document generator meta tag.
|
2023-06-20 16:25:49 +02:00 |
|
Viktor Lofgren
|
7326ba74fe
|
Tweaks to pub date heuristics to make it mostly get the 'historyofphilosophy.net' case right.
Use HTML standard for plausibility checks in the more guesswork-like heuristics. Added more class names to look for date strings.
|
2023-06-20 14:15:05 +02:00 |
|
Viktor Lofgren
|
67c15a34e6
|
Reduce the amount of expensive operations in HtmlDocumentProcessorPlugin.
|
2023-06-19 17:58:19 +02:00 |
|
Viktor Lofgren
|
266ad2e4de
|
Re-introduce monkey patched GSON to make converter run better.
fixup! Re-introduce monkey patched GSON to make converter run better.
fixup! Re-introduce monkey patched GSON to make converter run better.
|
2023-06-19 17:58:19 +02:00 |
|
Viktor Lofgren
|
44b1fe0e6d
|
Move list-conversion into getDescription method.
|
2023-06-19 17:58:19 +02:00 |
|
Viktor Lofgren
|
88399e30e2
|
Consider keyword relevance signals when creating the document summary using the DOM walker.
|
2023-06-19 17:58:19 +02:00 |
|
Viktor Lofgren
|
a9f7b4c457
|
Add synthetic keywords for same-site files linked from a document (e.g. file:png). Also add category keywords, like file:image or file:document.
|
2023-04-30 19:29:13 +02:00 |
|
Viktor Lofgren
|
2ab26f37b8
|
Bug fix for document metadata encoding that breaks year based queries.
|
2023-04-14 16:56:49 +02:00 |
|
Viktor Lofgren
|
cc4e089a5d
|
Consider average sentence length when selecting search results. This promotes proses over code listings, tabular data, etc.
|
2023-03-30 15:46:15 +02:00 |
|
Viktor Lofgren
|
4d05be4095
|
Refactor InternalLinkGraph
|
2023-03-30 15:44:23 +02:00 |
|
Viktor Lofgren
|
03bd892b95
|
Improve document processing in conversion.
* Add flags for long and short documents.
* Break out common length logic from plugins.
* Cleaning up of related code.
|
2023-03-28 16:38:00 +02:00 |
|
Viktor Lofgren
|
ca22c287a5
|
Make use of DocumentFlags' flags
|
2023-03-21 16:03:15 +01:00 |
|
Viktor Lofgren
|
2eb972dea1
|
Remove unrelated code, break tools into their own directory.
|
2023-03-17 16:03:11 +01:00 |
|
Viktor Lofgren
|
449471a076
|
Yet more restructuring. Improved search result ranking.
|
2023-03-16 21:35:54 +01:00 |
|
Viktor Lofgren
|
d82532b7f1
|
More restructuring, big bug fixes in keyword extraction.
|
2023-03-13 17:39:53 +01:00 |
|