Viktor Lofgren
|
fae3a025b8
|
Merge branch 'release'
|
2023-02-12 19:09:30 +01:00 |
|
Viktor Lofgren
|
61f7ce8ca5
|
Bug fix site listing
|
2023-02-12 15:28:50 +01:00 |
|
Viktor Lofgren
|
fa9b4e4352
|
A tiny release between crawls (#138)
Bringing online new ranking changes
Co-authored-by: Viktor Lofgren <vlofgren@marginalia.nu>
Reviewed-on: https://git.marginalia.nu/marginalia/marginalia.nu/pulls/138
|
2023-02-12 10:57:07 +01:00 |
|
Viktor Lofgren
|
6ef9f13c68
|
merge release into master
|
2023-02-12 10:53:51 +01:00 |
|
Viktor Lofgren
|
db50ca2231
|
Tidy up RankingSearchSet
|
2023-02-12 10:47:46 +01:00 |
|
Viktor Lofgren
|
4d9dce5733
|
Tidy up RankingSearchSet
|
2023-02-12 10:45:35 +01:00 |
|
Viktor Lofgren
|
bcadfc965d
|
Use new cosine-similarity ranking algorithm
|
2023-02-12 10:28:53 +01:00 |
|
Viktor Lofgren
|
3e1297064c
|
Tidy up code
|
2023-02-11 13:06:40 +01:00 |
|
Viktor Lofgren
|
06df8e9a28
|
Sort the index on rank to, like the previous design, prioritize the discovery of high ranking items.
|
2023-02-11 12:17:30 +01:00 |
|
Viktor Lofgren
|
e963ecb4ae
|
Modified the ranking algorithm to be able to pagerank with similarity data instead of the link graph.
|
2023-02-07 22:13:25 +01:00 |
|
Viktor Lofgren
|
04f905f3a1
|
Reintroduce the ability to filter search results by their ranking.
|
2023-02-04 12:59:24 +01:00 |
|
Viktor Lofgren
|
4a07eda61c
|
Debug query strategy options
|
2023-02-02 10:35:55 +01:00 |
|
Viktor Lofgren
|
b18cd0bc36
|
Improvements to array library and conversion
|
2023-02-02 10:35:14 +01:00 |
|
Viktor Lofgren
|
cdaeb7724a
|
Clean up braille punch cards
|
2023-02-02 10:34:17 +01:00 |
|
Viktor Lofgren
|
e3bea19d4d
|
Improvements to array library
|
2023-02-02 10:33:16 +01:00 |
|
Viktor Lofgren
|
8168d512b8
|
Retire defunct SMHI weather forecast integration.
|
2023-01-30 13:25:41 +01:00 |
|
Viktor Lofgren
|
4c2f54593e
|
Use on-heap dictionary for small data.
|
2023-01-30 13:10:56 +01:00 |
|
Viktor Lofgren
|
4a6a1308b0
|
Remove min length regex, the guard is too weak to be meaningful
|
2023-01-30 10:43:53 +01:00 |
|
Viktor Lofgren
|
2e4532ca90
|
Clean up KeywordMetadata
|
2023-01-30 10:22:43 +01:00 |
|
Viktor Lofgren
|
d5df3268b3
|
Update 3rd party readme
|
2023-01-30 10:22:28 +01:00 |
|
Viktor Lofgren
|
9320a457a5
|
Misc tweaks and cleanups
|
2023-01-30 09:44:09 +01:00 |
|
Viktor Lofgren
|
1dac4e7e67
|
Override defaults in GSON
|
2023-01-30 09:43:21 +01:00 |
|
Viktor Lofgren
|
65b0ff26fc
|
Better SiteWords extraction
|
2023-01-30 09:42:46 +01:00 |
|
Viktor Lofgren
|
5558af148e
|
Reduce memory churn in KeywordCounter
|
2023-01-30 09:42:27 +01:00 |
|
Viktor Lofgren
|
8349435ef4
|
Better subject extraction and remove unnecessary calculation from DocumentKeywordExtractor
|
2023-01-30 09:41:54 +01:00 |
|
Viktor Lofgren
|
4d0b444703
|
String deduplication
|
2023-01-30 09:40:29 +01:00 |
|
Viktor Lofgren
|
0fd21b9cbf
|
Reduce memory churn through BufferedReader via CrawledDomainReader
|
2023-01-30 09:39:16 +01:00 |
|
Viktor Lofgren
|
1b53a5389d
|
Remove poorly guarded regex in UrlBlocklist
|
2023-01-30 09:37:37 +01:00 |
|
Viktor Lofgren
|
28214ad770
|
Remove unnecessary toLowerCase in isStopWord
|
2023-01-30 09:37:15 +01:00 |
|
Viktor Lofgren
|
dfd652a8d5
|
Make WordRep behave consistently across compareTo/equals
|
2023-01-30 09:36:47 +01:00 |
|
Viktor Lofgren
|
50862a2081
|
Refactor sentence extractor to break it apart into more readable chunks
|
2023-01-30 09:36:11 +01:00 |
|
Viktor Lofgren
|
ed728b2680
|
Compressed string component
|
2023-01-30 09:33:04 +01:00 |
|
Viktor Lofgren
|
728931c135
|
Compressed string component
|
2023-01-30 09:29:14 +01:00 |
|
Viktor Lofgren
|
1f646e4f68
|
Reduce memory churn in RDRPOSTagger
|
2023-01-30 09:25:57 +01:00 |
|
Viktor Lofgren
|
618582dc74
|
Performance optimizations in EdgeDomain's parsing, reduce the number of unguarded regular expressions
|
2023-01-30 09:23:11 +01:00 |
|
Viktor Lofgren
|
4854f40447
|
Array library optimizations for sortLargeSpan
|
2023-01-30 09:22:10 +01:00 |
|
Viktor Lofgren
|
c8f7a8cb69
|
Fix bug in dealing with scheme-relative URLs
|
2023-01-19 15:46:32 +01:00 |
|
Viktor Lofgren
|
467bf566a9
|
Hotfixes for 2023-01 release (#137)
Co-authored-by: vlofgren <vlofgren@gmail.com>
Co-authored-by: vlofgren <vlofgren@marginalia.nu>
Co-authored-by: Viktor Lofgren <vlofgren@marginalia.nu>
Reviewed-on: https://git.marginalia.nu/marginalia/marginalia.nu/pulls/137
|
2023-01-11 19:48:03 +01:00 |
|
Viktor Lofgren
|
321a9028c7
|
Merge branch 'release'
|
2023-01-11 19:46:33 +01:00 |
|
Viktor Lofgren
|
5851e91424
|
Clean-up and fix for feature regression in site:-terms
|
2023-01-11 19:33:32 +01:00 |
|
Viktor Lofgren
|
fb2797a8ef
|
Tweaking search result valuation
|
2023-01-11 19:33:05 +01:00 |
|
Viktor Lofgren
|
085d985e61
|
Result selection algorithm tweaks
|
2023-01-11 17:19:57 +01:00 |
|
Viktor Lofgren
|
69ccf143ac
|
New search profile for hardcore web 1.0 content.
|
2023-01-11 16:11:51 +01:00 |
|
Viktor Lofgren
|
4d3ef0e3b3
|
Tool for cleaning raw index files based on a predicate.
|
2023-01-11 16:11:29 +01:00 |
|
Viktor Lofgren
|
4928b2e00e
|
Use a mapped file instead of allocating to save memory (#136)
Co-authored-by: vlofgren <vlofgren@gmail.com>
Co-authored-by: vlofgren <vlofgren@marginalia.nu>
Co-authored-by: Viktor Lofgren <vlofgren@marginalia.nu>
Reviewed-on: https://git.marginalia.nu/marginalia/marginalia.nu/pulls/136
|
2023-01-09 22:06:58 +01:00 |
|
Viktor Lofgren
|
cb408dd737
|
Fixes
|
2023-01-09 22:06:15 +01:00 |
|
Viktor Lofgren
|
4ec338d218
|
Merge branch 'release'
|
2023-01-09 20:21:33 +01:00 |
|
Viktor Lofgren
|
a9ddc328a6
|
Fixes from master (#135)
Co-authored-by: vlofgren <vlofgren@gmail.com>
Co-authored-by: vlofgren <vlofgren@marginalia.nu>
Co-authored-by: Viktor Lofgren <vlofgren@marginalia.nu>
Reviewed-on: https://git.marginalia.nu/marginalia/marginalia.nu/pulls/135
|
2023-01-09 18:47:04 +01:00 |
|
Viktor Lofgren
|
4ff68e8807
|
Merge branch 'release'
# Conflicts:
# marginalia_nu/src/main/java/nu/marginalia/wmsa/edge/index/postings/forward/ForwardIndexConverter.java
|
2023-01-09 18:46:48 +01:00 |
|
Viktor Lofgren
|
11b0d61efc
|
Fixes
|
2023-01-09 18:45:04 +01:00 |
|