Commit Graph

12 Commits

Author SHA1 Message Date
Viktor Lofgren
109bec372c (index) Adjust BM25 parameters 2024-01-05 13:21:52 +01:00
Viktor Lofgren
0e970b8037 (valuation) Tweaking penalties a bit 2024-01-05 13:21:52 +01:00
Viktor Lofgren
1694b4d6ef (valuation) Increase the penalty for adtech a bit 2024-01-05 13:21:34 +01:00
Viktor Lofgren
396299c1db (index) Reduce the value of site and site-adjacent in BM25P calculations 2024-01-05 13:21:33 +01:00
Viktor Lofgren
71d789aab0 (index) Tweak result valuation renormalization 2024-01-05 13:21:33 +01:00
Viktor Lofgren
d2418521a7 (index) Further ranking adjustments 2024-01-02 12:35:59 +01:00
Viktor Lofgren
9330b5b1d9 (index) Adjust rank weightings to fix bad wikipedia results
There was as bug where if the input of ResultValuator.normalize() was negative, it was truncated to zero.  This meant that "bad" results always rank the same.  The penalty factor "overallPart" was moved outside of the function and was re-weighted to accomplish a better normalization.

Some of the weights were also re-adjusted based on what appears to produce better results.  Needs evaluation.
2024-01-02 12:35:44 +01:00
Viktor Lofgren
0b8dc02eba (result-ranking) Nudge up results with ngram matches a tiny bit 2023-11-06 13:14:22 +01:00
Viktor Lofgren
48986574ae (result-ranking) Use a weighted calculation of priority term importance 2023-11-06 12:56:21 +01:00
Viktor Lofgren
c7a6a71d07 (result-ranking) Use a weighted calculation of priority term importance 2023-11-06 12:48:23 +01:00
Viktor Lofgren
0152004c42 Initial Commit Anchor Tags
* Added new (optional) model file in $WMSA_HOME/data/atags.parquet
* Converter gets a component for creating a projection of its domains onto the full atags parquet file
* New WordFlag ExternalLink
* These terms are also for now flagged as title words
* Fixed a bug where Title words aliased with UrlDomain words
* Fixed a bug in the encyclopedia sideloader that gave everything too high topology ranking
2023-11-04 14:24:17 +01:00
Viktor Lofgren
3889c4bdd9 (refactor) Remove features-search and update documentation 2023-10-09 15:12:30 +02:00