Commit Graph

255 Commits

Author SHA1 Message Date
Viktor Lofgren
c057ce74a8 Bugfix for rare bug where some queries may miss hits due to BTreeReader's retain function giving up too fast. 2022-11-22 16:33:29 +01:00
Viktor Lofgren
baaf21911a Reduce resource usage waste in edge-search by recycling QueryVariants 2022-11-18 17:12:34 +01:00
Viktor Lofgren
e86f52d7d8 Reduce resource usage waste in edge-search by recycling QueryVariants 2022-11-18 17:09:07 +01:00
Viktor Lofgren
655504c1f0 Hotfix for NaN-serialization bug in API service. 2022-11-06 12:12:10 +01:00
vlofgren
27893b414b Merge branch 'release'
# Conflicts:
#	marginalia_nu/src/main/java/nu/marginalia/wmsa/edge/search/command/commands/BrowseCommand.java
2022-10-30 11:33:06 +01:00
vlofgren
e7623010db Fetch more browse:domain-results. 2022-10-30 11:30:11 +01:00
Viktor Lofgren
395da07abe Sort browse:-results by relatedness if possible (#125)
Co-authored-by: vlofgren <vlofgren@gmail.com>
Co-authored-by: vlofgren <vlofgren@marginalia.nu>
Reviewed-on: https://git.marginalia.nu/marginalia/marginalia.nu/pulls/125
2022-10-30 10:56:01 +01:00
vlofgren
b97f425f7e Sort results by relatedness where possible. 2022-10-30 10:49:41 +01:00
Viktor Lofgren
c559611185 Prefer cosine similarity relatedness for browse:-queries. (#123)
Co-authored-by: vlofgren <vlofgren@gmail.com>
Co-authored-by: vlofgren <vlofgren@marginalia.nu>
Reviewed-on: https://git.marginalia.nu/marginalia/marginalia.nu/pulls/123
2022-10-30 10:32:33 +01:00
vlofgren
6231f525fd Prefer cosine similarity relatedness for browse:-queries. 2022-10-30 10:31:37 +01:00
Viktor Lofgren
e676d8729e GUI fixes and cleanups (#122)
Co-authored-by: vlofgren <vlofgren@gmail.com>
Co-authored-by: vlofgren <vlofgren@marginalia.nu>
Reviewed-on: https://git.marginalia.nu/marginalia/marginalia.nu/pulls/122
2022-10-30 10:08:19 +01:00
vlofgren
61a80b417b Fix for explore2.marginalia.nu where it wouldn't find some websites that were flagged as redirects. 2022-10-30 10:05:52 +01:00
vlofgren
cc5b425661 Add another w3m-helper bar to make the UI cleaner on terminal. 2022-10-30 09:56:37 +01:00
vlofgren
217584126c Improved publishing date heuristics 2022-10-29 11:20:01 +02:00
vlofgren
68ec3304a3 Update index 2022-10-27 19:16:35 +02:00
vlofgren
af8001d41e Less janky summary extraction 2022-10-27 19:16:35 +02:00
vlofgren
94c157c5c3 Publish-date guesser 2022-10-27 19:16:35 +02:00
Viktor Lofgren
c6abbc12f6 fix serialization issue (#121)
Co-authored-by: vlofgren <vlofgren@gmail.com>
Co-authored-by: vlofgren <vlofgren@marginalia.nu>
Reviewed-on: https://git.marginalia.nu/marginalia/marginalia.nu/pulls/121
2022-10-22 15:01:41 +02:00
vlofgren
8f8e6e147f Fix JSON serialization error 2022-10-22 14:42:37 +02:00
vlofgren
e6da7c1a29 Tweaks for new release. 2022-10-21 17:44:29 +02:00
Viktor Lofgren
0a35a7c1d0 master (#119)
Co-authored-by: vlofgren <vlofgren@gmail.com>
Reviewed-on: https://git.marginalia.nu/marginalia/marginalia.nu/pulls/119
2022-10-20 21:57:08 +02:00
vlofgren
5393167bf8 Fixes in sorting logic, and optimized update domain statistics to not take 4+ hours. 2022-10-20 21:55:51 +02:00
vlofgren
05762fe200 Index update. 2022-10-19 16:35:50 +02:00
Viktor Lofgren
df49ccbe59 October Release (#118)
Co-authored-by: vlofgren <vlofgren@gmail.com>
Co-authored-by: vlofgren <vlofgren@marginalia.nu>
Reviewed-on: https://git.marginalia.nu/marginalia/marginalia.nu/pulls/118
2022-10-19 15:00:04 +02:00
vlofgren
9a7d052c43 Adjustments to anchor tag extraction. 2022-09-18 10:59:16 +02:00
vlofgren
179d54d50a Processor fixes: Excluding phpinfo()-pages, mastodon feeds. 2022-09-16 18:05:54 +02:00
vlofgren
13c8305dc2 Exclude some guaranteed-to-be-noncanonical forum URLs. 2022-09-16 17:12:07 +02:00
vlofgren
324c05fc42 Exclude some guaranteed-to-be-noncanonical forum URLs. 2022-09-16 17:01:06 +02:00
vlofgren
123603b0a3 Some small crawler tweaks, plus a test for examining crawler behavior through a simulated server. 2022-09-16 16:59:06 +02:00
vlofgren
5e67391829 Some small crawler tweaks, plus a test for examining crawler behavior through a simulated server. 2022-09-16 16:52:33 +02:00
vlofgren
23a7d91d5b Better index metrics, fix bug where domain result show up with advisory search terms. 2022-09-15 17:04:15 +02:00
vlofgren
9558077808 UX improvements for "show more results". 2022-09-15 15:56:20 +02:00
vlofgren
2e740bb7bd Add advisory search terms that do not affect ranking. 2022-09-14 16:31:37 +02:00
vlofgren
680693b6db Fix old broken domain search. 2022-09-13 20:57:04 +02:00
vlofgren
8d15ddbab0 Tune query timeouts and fetch window to speed up queries a bit. 2022-09-13 18:50:04 +02:00
vlofgren
6df02f7528 HyperLogLog-tool for figuring out how big the index is. 2022-09-13 18:27:36 +02:00
vlofgren
10d1307dd6 Fix a query variant creation bug that caused the search engine to sometimes drop important words from a query. 2022-09-12 23:32:49 +02:00
vlofgren
297f8e4cd7 Fixing a bug where search terms would sometimes be ignored, tweaking timeouts, adding debug feature for the search service. 2022-09-12 21:08:53 +02:00
vlofgren
7749ce645a Further more cleaning 2022-09-12 10:39:02 +02:00
vlofgren
971089bad3 Cleaning up. 2022-09-11 11:58:39 +02:00
vlofgren
eaef93f4ae Cleaning up and adding better error messages. 2022-09-11 11:31:22 +02:00
vlofgren
fbe17b62ed Giga-refactor of the index query logic 2022-09-10 20:28:45 +02:00
vlofgren
c6976acdfc WIP Loading 2022-09-05 17:51:49 +02:00
vlofgren
c912d3127d Better hints. 2022-09-03 18:35:04 +02:00
vlofgren
2e3d95bcb1 Refactoring and cleanup 2022-09-03 17:32:53 +02:00
vlofgren
5a4d41d414 Refactoring and cleanup, WIP 2022-09-03 15:20:26 +02:00
vlofgren
26e0cfec3a Preparation for conversion 2022-09-02 17:45:03 +02:00
vlofgren
ccf79f47b0 Preparation for conversion 2022-09-02 14:51:11 +02:00
vlofgren
a04d27692e Merge branch 'master' into experimental-22-08 2022-09-02 11:29:30 +02:00
vlofgren
578ecfb27d CSS tweaks for search. 2022-09-02 10:58:07 +02:00