Viktor Lofgren
|
2e4532ca90
|
Clean up KeywordMetadata
|
2023-01-30 10:22:43 +01:00 |
|
Viktor Lofgren
|
9320a457a5
|
Misc tweaks and cleanups
|
2023-01-30 09:44:09 +01:00 |
|
Viktor Lofgren
|
65b0ff26fc
|
Better SiteWords extraction
|
2023-01-30 09:42:46 +01:00 |
|
Viktor Lofgren
|
5558af148e
|
Reduce memory churn in KeywordCounter
|
2023-01-30 09:42:27 +01:00 |
|
Viktor Lofgren
|
8349435ef4
|
Better subject extraction and remove unnecessary calculation from DocumentKeywordExtractor
|
2023-01-30 09:41:54 +01:00 |
|
Viktor Lofgren
|
4d0b444703
|
String deduplication
|
2023-01-30 09:40:29 +01:00 |
|
Viktor Lofgren
|
0fd21b9cbf
|
Reduce memory churn through BufferedReader via CrawledDomainReader
|
2023-01-30 09:39:16 +01:00 |
|
Viktor Lofgren
|
1b53a5389d
|
Remove poorly guarded regex in UrlBlocklist
|
2023-01-30 09:37:37 +01:00 |
|
Viktor Lofgren
|
28214ad770
|
Remove unnecessary toLowerCase in isStopWord
|
2023-01-30 09:37:15 +01:00 |
|
Viktor Lofgren
|
dfd652a8d5
|
Make WordRep behave consistently across compareTo/equals
|
2023-01-30 09:36:47 +01:00 |
|
Viktor Lofgren
|
50862a2081
|
Refactor sentence extractor to break it apart into more readable chunks
|
2023-01-30 09:36:11 +01:00 |
|
Viktor Lofgren
|
ed728b2680
|
Compressed string component
|
2023-01-30 09:33:04 +01:00 |
|
Viktor Lofgren
|
728931c135
|
Compressed string component
|
2023-01-30 09:29:14 +01:00 |
|
Viktor Lofgren
|
618582dc74
|
Performance optimizations in EdgeDomain's parsing, reduce the number of unguarded regular expressions
|
2023-01-30 09:23:11 +01:00 |
|
Viktor Lofgren
|
4854f40447
|
Array library optimizations for sortLargeSpan
|
2023-01-30 09:22:10 +01:00 |
|
Viktor Lofgren
|
c8f7a8cb69
|
Fix bug in dealing with scheme-relative URLs
|
2023-01-19 15:46:32 +01:00 |
|
Viktor Lofgren
|
5851e91424
|
Clean-up and fix for feature regression in site:-terms
|
2023-01-11 19:33:32 +01:00 |
|
Viktor Lofgren
|
fb2797a8ef
|
Tweaking search result valuation
|
2023-01-11 19:33:05 +01:00 |
|
Viktor Lofgren
|
085d985e61
|
Result selection algorithm tweaks
|
2023-01-11 17:19:57 +01:00 |
|
Viktor Lofgren
|
69ccf143ac
|
New search profile for hardcore web 1.0 content.
|
2023-01-11 16:11:51 +01:00 |
|
Viktor Lofgren
|
4d3ef0e3b3
|
Tool for cleaning raw index files based on a predicate.
|
2023-01-11 16:11:29 +01:00 |
|
Viktor Lofgren
|
cb408dd737
|
Fixes
|
2023-01-09 22:06:15 +01:00 |
|
Viktor Lofgren
|
11b0d61efc
|
Fixes
|
2023-01-09 18:45:04 +01:00 |
|
Viktor Lofgren
|
0b6200705e
|
Bugfix in forward converter, should force both files before exiting. Also don't need to create an intermediate file.
|
2023-01-09 16:57:58 +01:00 |
|
Viktor Lofgren
|
58cae7d963
|
Bugfix for logs.
|
2023-01-09 15:46:11 +01:00 |
|
Viktor Lofgren
|
6d33c386fc
|
Merge changes from experimental branch (#132)
Co-authored-by: Viktor Lofgren <vlofgren@marginalia.nu>
Reviewed-on: https://git.marginalia.nu/marginalia/marginalia.nu/pulls/132
|
2023-01-08 11:11:44 +01:00 |
|
Viktor Lofgren
|
c057ce74a8
|
Bugfix for rare bug where some queries may miss hits due to BTreeReader's retain function giving up too fast.
|
2022-11-22 16:33:29 +01:00 |
|
Viktor Lofgren
|
baaf21911a
|
Reduce resource usage waste in edge-search by recycling QueryVariants
|
2022-11-18 17:12:34 +01:00 |
|
Viktor Lofgren
|
e86f52d7d8
|
Reduce resource usage waste in edge-search by recycling QueryVariants
|
2022-11-18 17:09:07 +01:00 |
|
Viktor Lofgren
|
655504c1f0
|
Hotfix for NaN-serialization bug in API service.
|
2022-11-06 12:12:10 +01:00 |
|
vlofgren
|
27893b414b
|
Merge branch 'release'
# Conflicts:
# marginalia_nu/src/main/java/nu/marginalia/wmsa/edge/search/command/commands/BrowseCommand.java
|
2022-10-30 11:33:06 +01:00 |
|
vlofgren
|
e7623010db
|
Fetch more browse:domain-results.
|
2022-10-30 11:30:11 +01:00 |
|
Viktor Lofgren
|
395da07abe
|
Sort browse:-results by relatedness if possible (#125)
Co-authored-by: vlofgren <vlofgren@gmail.com>
Co-authored-by: vlofgren <vlofgren@marginalia.nu>
Reviewed-on: https://git.marginalia.nu/marginalia/marginalia.nu/pulls/125
|
2022-10-30 10:56:01 +01:00 |
|
vlofgren
|
b97f425f7e
|
Sort results by relatedness where possible.
|
2022-10-30 10:49:41 +01:00 |
|
Viktor Lofgren
|
c559611185
|
Prefer cosine similarity relatedness for browse:-queries. (#123)
Co-authored-by: vlofgren <vlofgren@gmail.com>
Co-authored-by: vlofgren <vlofgren@marginalia.nu>
Reviewed-on: https://git.marginalia.nu/marginalia/marginalia.nu/pulls/123
|
2022-10-30 10:32:33 +01:00 |
|
vlofgren
|
6231f525fd
|
Prefer cosine similarity relatedness for browse:-queries.
|
2022-10-30 10:31:37 +01:00 |
|
Viktor Lofgren
|
e676d8729e
|
GUI fixes and cleanups (#122)
Co-authored-by: vlofgren <vlofgren@gmail.com>
Co-authored-by: vlofgren <vlofgren@marginalia.nu>
Reviewed-on: https://git.marginalia.nu/marginalia/marginalia.nu/pulls/122
|
2022-10-30 10:08:19 +01:00 |
|
vlofgren
|
61a80b417b
|
Fix for explore2.marginalia.nu where it wouldn't find some websites that were flagged as redirects.
|
2022-10-30 10:05:52 +01:00 |
|
vlofgren
|
cc5b425661
|
Add another w3m-helper bar to make the UI cleaner on terminal.
|
2022-10-30 09:56:37 +01:00 |
|
vlofgren
|
217584126c
|
Improved publishing date heuristics
|
2022-10-29 11:20:01 +02:00 |
|
vlofgren
|
68ec3304a3
|
Update index
|
2022-10-27 19:16:35 +02:00 |
|
vlofgren
|
af8001d41e
|
Less janky summary extraction
|
2022-10-27 19:16:35 +02:00 |
|
vlofgren
|
94c157c5c3
|
Publish-date guesser
|
2022-10-27 19:16:35 +02:00 |
|
Viktor Lofgren
|
c6abbc12f6
|
fix serialization issue (#121)
Co-authored-by: vlofgren <vlofgren@gmail.com>
Co-authored-by: vlofgren <vlofgren@marginalia.nu>
Reviewed-on: https://git.marginalia.nu/marginalia/marginalia.nu/pulls/121
|
2022-10-22 15:01:41 +02:00 |
|
vlofgren
|
8f8e6e147f
|
Fix JSON serialization error
|
2022-10-22 14:42:37 +02:00 |
|
vlofgren
|
e6da7c1a29
|
Tweaks for new release.
|
2022-10-21 17:44:29 +02:00 |
|
Viktor Lofgren
|
0a35a7c1d0
|
master (#119)
Co-authored-by: vlofgren <vlofgren@gmail.com>
Reviewed-on: https://git.marginalia.nu/marginalia/marginalia.nu/pulls/119
|
2022-10-20 21:57:08 +02:00 |
|
vlofgren
|
5393167bf8
|
Fixes in sorting logic, and optimized update domain statistics to not take 4+ hours.
|
2022-10-20 21:55:51 +02:00 |
|
vlofgren
|
05762fe200
|
Index update.
|
2022-10-19 16:35:50 +02:00 |
|
Viktor Lofgren
|
df49ccbe59
|
October Release (#118)
Co-authored-by: vlofgren <vlofgren@gmail.com>
Co-authored-by: vlofgren <vlofgren@marginalia.nu>
Reviewed-on: https://git.marginalia.nu/marginalia/marginalia.nu/pulls/118
|
2022-10-19 15:00:04 +02:00 |
|