Commit Graph

20 Commits

Author SHA1 Message Date
Viktor Lofgren
d82a858491 Don't consider slash to be a sentence separator. 2023-05-31 16:54:30 +02:00
Viktor
7694a15f62
Fix kale's unreasonably high weighting factor 2023-04-22 20:55:09 +02:00
Viktor Lofgren
619fb8ba80 (converter) Adjust the pub-date sniffing heuristics' order. Doing HTML5 tags too early puts some sites too early. Also expanded support for JSON+LD. 2023-04-19 15:28:50 +02:00
Viktor Lofgren
810515c08d Clean up artifact extractor. 2023-04-10 13:07:54 +02:00
Viktor
a278fc6296
Increase search result relevance (#8)
* Increase accuracy of the position bits.
* Increase their width to 56.
* Use a rolling position scheme for bits 16-56 to increase the average accuracy.
* Result ranking overhaul
* Optimized queries
* BM25 in the index service's ranking
* Make gui less jank
* Javadocs for ranking parameters.
2023-04-07 20:18:08 +02:00
Viktor Lofgren
716ab35b4e Search ranking debuggability improvements. 2023-04-02 13:43:24 +02:00
Viktor Lofgren
137adb9c3c Bitmask calculation improvement. Take sentence length into consideration, not all lines are equal. 2023-03-30 15:42:06 +02:00
Viktor Lofgren
0fcb2b534c Polish Names 2023-03-29 16:51:47 +02:00
Viktor
0b505939ed
Update features-convert/readme.md 2023-03-25 12:43:58 +01:00
Viktor Lofgren
46f81aca2f Break apart reverse index into a separate full index and priority index. It did this before using the same code. This will make the priority index about half as big since it no longer needs to keep metadata. 2023-03-21 16:12:31 +01:00
Viktor Lofgren
624e8acd41 Remove copy-pasted application plugin from subprojects that define features. 2023-03-20 17:25:58 +01:00
Viktor Lofgren
0682550bd2 Clean up summary extractor module. 2023-03-18 10:33:58 +01:00
Viktor Lofgren
6e89377dea Clean up summary extractor module. 2023-03-18 10:29:25 +01:00
Viktor Lofgren
950c49d80f Clean up summary extractor module. 2023-03-18 10:28:48 +01:00
Viktor Lofgren
8def95e849 Clean up summary extractor module. 2023-03-18 10:24:12 +01:00
Viktor Lofgren
43430728aa Clean up summary extractor module. 2023-03-18 10:21:41 +01:00
Viktor Lofgren
2eb972dea1 Remove unrelated code, break tools into their own directory. 2023-03-17 16:03:11 +01:00
Viktor Lofgren
449471a076 Yet more restructuring. Improved search result ranking. 2023-03-16 21:35:54 +01:00
Viktor Lofgren
0ecab53635 Yet more restructuring. 2023-03-13 23:40:26 +01:00
Viktor Lofgren
d82532b7f1 More restructuring, big bug fixes in keyword extraction. 2023-03-13 17:39:53 +01:00