Viktor Lofgren
|
50862a2081
|
Refactor sentence extractor to break it apart into more readable chunks
|
2023-01-30 09:36:11 +01:00 |
|
Viktor Lofgren
|
ed728b2680
|
Compressed string component
|
2023-01-30 09:33:04 +01:00 |
|
Viktor Lofgren
|
728931c135
|
Compressed string component
|
2023-01-30 09:29:14 +01:00 |
|
Viktor Lofgren
|
618582dc74
|
Performance optimizations in EdgeDomain's parsing, reduce the number of unguarded regular expressions
|
2023-01-30 09:23:11 +01:00 |
|
Viktor Lofgren
|
4854f40447
|
Array library optimizations for sortLargeSpan
|
2023-01-30 09:22:10 +01:00 |
|
Viktor Lofgren
|
c8f7a8cb69
|
Fix bug in dealing with scheme-relative URLs
|
2023-01-19 15:46:32 +01:00 |
|
Viktor Lofgren
|
5851e91424
|
Clean-up and fix for feature regression in site:-terms
|
2023-01-11 19:33:32 +01:00 |
|
Viktor Lofgren
|
fb2797a8ef
|
Tweaking search result valuation
|
2023-01-11 19:33:05 +01:00 |
|
Viktor Lofgren
|
085d985e61
|
Result selection algorithm tweaks
|
2023-01-11 17:19:57 +01:00 |
|
Viktor Lofgren
|
69ccf143ac
|
New search profile for hardcore web 1.0 content.
|
2023-01-11 16:11:51 +01:00 |
|
Viktor Lofgren
|
4d3ef0e3b3
|
Tool for cleaning raw index files based on a predicate.
|
2023-01-11 16:11:29 +01:00 |
|
Viktor Lofgren
|
cb408dd737
|
Fixes
|
2023-01-09 22:06:15 +01:00 |
|
Viktor Lofgren
|
11b0d61efc
|
Fixes
|
2023-01-09 18:45:04 +01:00 |
|
Viktor Lofgren
|
0b6200705e
|
Bugfix in forward converter, should force both files before exiting. Also don't need to create an intermediate file.
|
2023-01-09 16:57:58 +01:00 |
|
Viktor Lofgren
|
58cae7d963
|
Bugfix for logs.
|
2023-01-09 15:46:11 +01:00 |
|
Viktor Lofgren
|
6d33c386fc
|
Merge changes from experimental branch (#132)
Co-authored-by: Viktor Lofgren <vlofgren@marginalia.nu>
Reviewed-on: https://git.marginalia.nu/marginalia/marginalia.nu/pulls/132
|
2023-01-08 11:11:44 +01:00 |
|
Viktor Lofgren
|
c057ce74a8
|
Bugfix for rare bug where some queries may miss hits due to BTreeReader's retain function giving up too fast.
|
2022-11-22 16:33:29 +01:00 |
|
Viktor Lofgren
|
baaf21911a
|
Reduce resource usage waste in edge-search by recycling QueryVariants
|
2022-11-18 17:12:34 +01:00 |
|
Viktor Lofgren
|
e86f52d7d8
|
Reduce resource usage waste in edge-search by recycling QueryVariants
|
2022-11-18 17:09:07 +01:00 |
|
Viktor Lofgren
|
655504c1f0
|
Hotfix for NaN-serialization bug in API service.
|
2022-11-06 12:12:10 +01:00 |
|
vlofgren
|
27893b414b
|
Merge branch 'release'
# Conflicts:
# marginalia_nu/src/main/java/nu/marginalia/wmsa/edge/search/command/commands/BrowseCommand.java
|
2022-10-30 11:33:06 +01:00 |
|
vlofgren
|
e7623010db
|
Fetch more browse:domain-results.
|
2022-10-30 11:30:11 +01:00 |
|
Viktor Lofgren
|
395da07abe
|
Sort browse:-results by relatedness if possible (#125)
Co-authored-by: vlofgren <vlofgren@gmail.com>
Co-authored-by: vlofgren <vlofgren@marginalia.nu>
Reviewed-on: https://git.marginalia.nu/marginalia/marginalia.nu/pulls/125
|
2022-10-30 10:56:01 +01:00 |
|
vlofgren
|
b97f425f7e
|
Sort results by relatedness where possible.
|
2022-10-30 10:49:41 +01:00 |
|
Viktor Lofgren
|
c559611185
|
Prefer cosine similarity relatedness for browse:-queries. (#123)
Co-authored-by: vlofgren <vlofgren@gmail.com>
Co-authored-by: vlofgren <vlofgren@marginalia.nu>
Reviewed-on: https://git.marginalia.nu/marginalia/marginalia.nu/pulls/123
|
2022-10-30 10:32:33 +01:00 |
|
vlofgren
|
6231f525fd
|
Prefer cosine similarity relatedness for browse:-queries.
|
2022-10-30 10:31:37 +01:00 |
|
Viktor Lofgren
|
e676d8729e
|
GUI fixes and cleanups (#122)
Co-authored-by: vlofgren <vlofgren@gmail.com>
Co-authored-by: vlofgren <vlofgren@marginalia.nu>
Reviewed-on: https://git.marginalia.nu/marginalia/marginalia.nu/pulls/122
|
2022-10-30 10:08:19 +01:00 |
|
vlofgren
|
61a80b417b
|
Fix for explore2.marginalia.nu where it wouldn't find some websites that were flagged as redirects.
|
2022-10-30 10:05:52 +01:00 |
|
vlofgren
|
cc5b425661
|
Add another w3m-helper bar to make the UI cleaner on terminal.
|
2022-10-30 09:56:37 +01:00 |
|
vlofgren
|
217584126c
|
Improved publishing date heuristics
|
2022-10-29 11:20:01 +02:00 |
|
vlofgren
|
68ec3304a3
|
Update index
|
2022-10-27 19:16:35 +02:00 |
|
vlofgren
|
af8001d41e
|
Less janky summary extraction
|
2022-10-27 19:16:35 +02:00 |
|
vlofgren
|
94c157c5c3
|
Publish-date guesser
|
2022-10-27 19:16:35 +02:00 |
|
Viktor Lofgren
|
c6abbc12f6
|
fix serialization issue (#121)
Co-authored-by: vlofgren <vlofgren@gmail.com>
Co-authored-by: vlofgren <vlofgren@marginalia.nu>
Reviewed-on: https://git.marginalia.nu/marginalia/marginalia.nu/pulls/121
|
2022-10-22 15:01:41 +02:00 |
|
vlofgren
|
8f8e6e147f
|
Fix JSON serialization error
|
2022-10-22 14:42:37 +02:00 |
|
vlofgren
|
e6da7c1a29
|
Tweaks for new release.
|
2022-10-21 17:44:29 +02:00 |
|
Viktor Lofgren
|
0a35a7c1d0
|
master (#119)
Co-authored-by: vlofgren <vlofgren@gmail.com>
Reviewed-on: https://git.marginalia.nu/marginalia/marginalia.nu/pulls/119
|
2022-10-20 21:57:08 +02:00 |
|
vlofgren
|
5393167bf8
|
Fixes in sorting logic, and optimized update domain statistics to not take 4+ hours.
|
2022-10-20 21:55:51 +02:00 |
|
vlofgren
|
05762fe200
|
Index update.
|
2022-10-19 16:35:50 +02:00 |
|
Viktor Lofgren
|
df49ccbe59
|
October Release (#118)
Co-authored-by: vlofgren <vlofgren@gmail.com>
Co-authored-by: vlofgren <vlofgren@marginalia.nu>
Reviewed-on: https://git.marginalia.nu/marginalia/marginalia.nu/pulls/118
|
2022-10-19 15:00:04 +02:00 |
|
vlofgren
|
9a7d052c43
|
Adjustments to anchor tag extraction.
|
2022-09-18 10:59:16 +02:00 |
|
vlofgren
|
179d54d50a
|
Processor fixes: Excluding phpinfo()-pages, mastodon feeds.
|
2022-09-16 18:05:54 +02:00 |
|
vlofgren
|
13c8305dc2
|
Exclude some guaranteed-to-be-noncanonical forum URLs.
|
2022-09-16 17:12:07 +02:00 |
|
vlofgren
|
324c05fc42
|
Exclude some guaranteed-to-be-noncanonical forum URLs.
|
2022-09-16 17:01:06 +02:00 |
|
vlofgren
|
123603b0a3
|
Some small crawler tweaks, plus a test for examining crawler behavior through a simulated server.
|
2022-09-16 16:59:06 +02:00 |
|
vlofgren
|
5e67391829
|
Some small crawler tweaks, plus a test for examining crawler behavior through a simulated server.
|
2022-09-16 16:52:33 +02:00 |
|
vlofgren
|
23a7d91d5b
|
Better index metrics, fix bug where domain result show up with advisory search terms.
|
2022-09-15 17:04:15 +02:00 |
|
vlofgren
|
9558077808
|
UX improvements for "show more results".
|
2022-09-15 15:56:20 +02:00 |
|
vlofgren
|
2e740bb7bd
|
Add advisory search terms that do not affect ranking.
|
2022-09-14 16:31:37 +02:00 |
|
vlofgren
|
680693b6db
|
Fix old broken domain search.
|
2022-09-13 20:57:04 +02:00 |
|