Commit Graph

16 Commits

Author SHA1 Message Date
Viktor Lofgren
78c00ad512 (valuation) Tweaking penalties a bit 2024-01-03 15:52:57 +01:00
Viktor Lofgren
a19879d494 (valuation) Tweaking penalties a bit 2024-01-03 15:32:33 +01:00
Viktor Lofgren
ac1aca36b0 (valuation) Increase the penalty for adtech a bit 2024-01-03 15:20:38 +01:00
Viktor Lofgren
1f3b89cf28 (index) Reduce the value of site and site-adjacent in BM25P calculations 2024-01-03 15:20:18 +01:00
Viktor Lofgren
f732f6ae6f (index) Tweak result valuation renormalization 2024-01-03 14:53:53 +01:00
Viktor Lofgren
0b9f3d1751 (*) Remove accidental commit of debug logging 2024-01-03 14:32:00 +01:00
Viktor Lofgren
4ce692ccaf (converter) Use SimpleBlockingThreadPool in ProcessingIterator 2024-01-03 14:27:47 +01:00
Viktor Lofgren
310a880fa8 (index) Further ranking adjustments 2024-01-02 12:24:52 +01:00
Viktor Lofgren
fc6e3b6da0 (index) Further ranking adjustments 2024-01-01 18:51:03 +01:00
Viktor Lofgren
50771045d0 (index) Further ranking adjustments 2024-01-01 18:43:17 +01:00
Viktor Lofgren
8f522470ed (index) Adjust rank weightings to fix bad wikipedia results
There was as bug where if the input of ResultValuator.normalize() was negative, it was truncated to zero.  This meant that "bad" results always rank the same.  The penalty factor "overallPart" was moved outside of the function and was re-weighted to accomplish a better normalization.

Some of the weights were also re-adjusted based on what appears to produce better results.  Needs evaluation.
2024-01-01 17:16:29 +01:00
Viktor Lofgren
0b8dc02eba (result-ranking) Nudge up results with ngram matches a tiny bit 2023-11-06 13:14:22 +01:00
Viktor Lofgren
48986574ae (result-ranking) Use a weighted calculation of priority term importance 2023-11-06 12:56:21 +01:00
Viktor Lofgren
c7a6a71d07 (result-ranking) Use a weighted calculation of priority term importance 2023-11-06 12:48:23 +01:00
Viktor Lofgren
0152004c42 Initial Commit Anchor Tags
* Added new (optional) model file in $WMSA_HOME/data/atags.parquet
* Converter gets a component for creating a projection of its domains onto the full atags parquet file
* New WordFlag ExternalLink
* These terms are also for now flagged as title words
* Fixed a bug where Title words aliased with UrlDomain words
* Fixed a bug in the encyclopedia sideloader that gave everything too high topology ranking
2023-11-04 14:24:17 +01:00
Viktor Lofgren
3889c4bdd9 (refactor) Remove features-search and update documentation 2023-10-09 15:12:30 +02:00