vlofgren
|
e7623010db
|
Fetch more browse:domain-results.
|
2022-10-30 11:30:11 +01:00 |
|
vlofgren
|
b97f425f7e
|
Sort results by relatedness where possible.
|
2022-10-30 10:49:41 +01:00 |
|
vlofgren
|
6231f525fd
|
Prefer cosine similarity relatedness for browse:-queries.
|
2022-10-30 10:31:37 +01:00 |
|
vlofgren
|
61a80b417b
|
Fix for explore2.marginalia.nu where it wouldn't find some websites that were flagged as redirects.
|
2022-10-30 10:05:52 +01:00 |
|
vlofgren
|
cc5b425661
|
Add another w3m-helper bar to make the UI cleaner on terminal.
|
2022-10-30 09:56:37 +01:00 |
|
vlofgren
|
217584126c
|
Improved publishing date heuristics
|
2022-10-29 11:20:01 +02:00 |
|
vlofgren
|
68ec3304a3
|
Update index
|
2022-10-27 19:16:35 +02:00 |
|
vlofgren
|
af8001d41e
|
Less janky summary extraction
|
2022-10-27 19:16:35 +02:00 |
|
vlofgren
|
94c157c5c3
|
Publish-date guesser
|
2022-10-27 19:16:35 +02:00 |
|
vlofgren
|
8f8e6e147f
|
Fix JSON serialization error
|
2022-10-22 14:42:37 +02:00 |
|
vlofgren
|
e6da7c1a29
|
Tweaks for new release.
|
2022-10-21 17:44:29 +02:00 |
|
vlofgren
|
5393167bf8
|
Fixes in sorting logic, and optimized update domain statistics to not take 4+ hours.
|
2022-10-20 21:55:51 +02:00 |
|
vlofgren
|
05762fe200
|
Index update.
|
2022-10-19 16:35:50 +02:00 |
|
Viktor Lofgren
|
df49ccbe59
|
October Release (#118)
Co-authored-by: vlofgren <vlofgren@gmail.com>
Co-authored-by: vlofgren <vlofgren@marginalia.nu>
Reviewed-on: https://git.marginalia.nu/marginalia/marginalia.nu/pulls/118
|
2022-10-19 15:00:04 +02:00 |
|
vlofgren
|
9a7d052c43
|
Adjustments to anchor tag extraction.
|
2022-09-18 10:59:16 +02:00 |
|
vlofgren
|
179d54d50a
|
Processor fixes: Excluding phpinfo()-pages, mastodon feeds.
|
2022-09-16 18:05:54 +02:00 |
|
vlofgren
|
13c8305dc2
|
Exclude some guaranteed-to-be-noncanonical forum URLs.
|
2022-09-16 17:12:07 +02:00 |
|
vlofgren
|
324c05fc42
|
Exclude some guaranteed-to-be-noncanonical forum URLs.
|
2022-09-16 17:01:06 +02:00 |
|
vlofgren
|
123603b0a3
|
Some small crawler tweaks, plus a test for examining crawler behavior through a simulated server.
|
2022-09-16 16:59:06 +02:00 |
|
vlofgren
|
5e67391829
|
Some small crawler tweaks, plus a test for examining crawler behavior through a simulated server.
|
2022-09-16 16:52:33 +02:00 |
|
vlofgren
|
23a7d91d5b
|
Better index metrics, fix bug where domain result show up with advisory search terms.
|
2022-09-15 17:04:15 +02:00 |
|
vlofgren
|
9558077808
|
UX improvements for "show more results".
|
2022-09-15 15:56:20 +02:00 |
|
vlofgren
|
2e740bb7bd
|
Add advisory search terms that do not affect ranking.
|
2022-09-14 16:31:37 +02:00 |
|
vlofgren
|
680693b6db
|
Fix old broken domain search.
|
2022-09-13 20:57:04 +02:00 |
|
vlofgren
|
8d15ddbab0
|
Tune query timeouts and fetch window to speed up queries a bit.
|
2022-09-13 18:50:04 +02:00 |
|
vlofgren
|
6df02f7528
|
HyperLogLog-tool for figuring out how big the index is.
|
2022-09-13 18:27:36 +02:00 |
|
vlofgren
|
10d1307dd6
|
Fix a query variant creation bug that caused the search engine to sometimes drop important words from a query.
|
2022-09-12 23:32:49 +02:00 |
|
vlofgren
|
297f8e4cd7
|
Fixing a bug where search terms would sometimes be ignored, tweaking timeouts, adding debug feature for the search service.
|
2022-09-12 21:08:53 +02:00 |
|
vlofgren
|
7749ce645a
|
Further more cleaning
|
2022-09-12 10:39:02 +02:00 |
|
vlofgren
|
971089bad3
|
Cleaning up.
|
2022-09-11 11:58:39 +02:00 |
|
vlofgren
|
eaef93f4ae
|
Cleaning up and adding better error messages.
|
2022-09-11 11:31:22 +02:00 |
|
vlofgren
|
fbe17b62ed
|
Giga-refactor of the index query logic
|
2022-09-10 20:28:45 +02:00 |
|
vlofgren
|
c6976acdfc
|
WIP Loading
|
2022-09-05 17:51:49 +02:00 |
|
vlofgren
|
c912d3127d
|
Better hints.
|
2022-09-03 18:35:04 +02:00 |
|
vlofgren
|
2e3d95bcb1
|
Refactoring and cleanup
|
2022-09-03 17:32:53 +02:00 |
|
vlofgren
|
5a4d41d414
|
Refactoring and cleanup, WIP
|
2022-09-03 15:20:26 +02:00 |
|
vlofgren
|
26e0cfec3a
|
Preparation for conversion
|
2022-09-02 17:45:03 +02:00 |
|
vlofgren
|
ccf79f47b0
|
Preparation for conversion
|
2022-09-02 14:51:11 +02:00 |
|
vlofgren
|
a04d27692e
|
Merge branch 'master' into experimental-22-08
|
2022-09-02 11:29:30 +02:00 |
|
vlofgren
|
578ecfb27d
|
CSS tweaks for search.
|
2022-09-02 10:58:07 +02:00 |
|
vlofgren
|
3fd48e0e53
|
Cleaning the code a bit, fix URL loading bug with multiple fragments in URL
|
2022-09-02 10:41:02 +02:00 |
|
vlofgren
|
5dd61387bf
|
Merge branch 'master' into experimental-22-08
|
2022-09-02 09:39:20 +02:00 |
|
vlofgren
|
5b8dc18d81
|
Fix copy errrors in index.hdb
|
2022-09-02 09:35:19 +02:00 |
|
vlofgren
|
9270230065
|
WIP logic for detecting significant images in the body of a website.
|
2022-09-02 09:35:19 +02:00 |
|
vlofgren
|
5f993c72dd
|
Tweaks for search result relevance
|
2022-09-02 09:34:20 +02:00 |
|
vlofgren
|
813399401e
|
Tweaks for search result relevance
|
2022-08-29 18:01:07 +02:00 |
|
vlofgren
|
3f2854a5e9
|
WIP n-gram loader
|
2022-08-27 20:30:18 +02:00 |
|
vlofgren
|
0282156979
|
WIP n-gram loader
|
2022-08-27 19:19:16 +02:00 |
|
vlofgren
|
c865d6c6b2
|
Change TF-IDF normalization to reduce the amount of not-so-relevant matches.
|
2022-08-27 11:38:29 +02:00 |
|
vlofgren
|
f4ad7aaf33
|
Remove accidental import of an unused library,
fix build on jdk18-systems.
|
2022-08-26 20:48:44 +02:00 |
|