Commit Graph

494 Commits

Author SHA1 Message Date
Viktor Lofgren
2e4532ca90 Clean up KeywordMetadata 2023-01-30 10:22:43 +01:00
Viktor Lofgren
d5df3268b3 Update 3rd party readme 2023-01-30 10:22:28 +01:00
Viktor Lofgren
9320a457a5 Misc tweaks and cleanups 2023-01-30 09:44:09 +01:00
Viktor Lofgren
1dac4e7e67 Override defaults in GSON 2023-01-30 09:43:21 +01:00
Viktor Lofgren
65b0ff26fc Better SiteWords extraction 2023-01-30 09:42:46 +01:00
Viktor Lofgren
5558af148e Reduce memory churn in KeywordCounter 2023-01-30 09:42:27 +01:00
Viktor Lofgren
8349435ef4 Better subject extraction and remove unnecessary calculation from DocumentKeywordExtractor 2023-01-30 09:41:54 +01:00
Viktor Lofgren
4d0b444703 String deduplication 2023-01-30 09:40:29 +01:00
Viktor Lofgren
0fd21b9cbf Reduce memory churn through BufferedReader via CrawledDomainReader 2023-01-30 09:39:16 +01:00
Viktor Lofgren
1b53a5389d Remove poorly guarded regex in UrlBlocklist 2023-01-30 09:37:37 +01:00
Viktor Lofgren
28214ad770 Remove unnecessary toLowerCase in isStopWord 2023-01-30 09:37:15 +01:00
Viktor Lofgren
dfd652a8d5 Make WordRep behave consistently across compareTo/equals 2023-01-30 09:36:47 +01:00
Viktor Lofgren
50862a2081 Refactor sentence extractor to break it apart into more readable chunks 2023-01-30 09:36:11 +01:00
Viktor Lofgren
ed728b2680 Compressed string component 2023-01-30 09:33:04 +01:00
Viktor Lofgren
728931c135 Compressed string component 2023-01-30 09:29:14 +01:00
Viktor Lofgren
1f646e4f68 Reduce memory churn in RDRPOSTagger 2023-01-30 09:25:57 +01:00
Viktor Lofgren
618582dc74 Performance optimizations in EdgeDomain's parsing, reduce the number of unguarded regular expressions 2023-01-30 09:23:11 +01:00
Viktor Lofgren
4854f40447 Array library optimizations for sortLargeSpan 2023-01-30 09:22:10 +01:00
Viktor Lofgren
c8f7a8cb69 Fix bug in dealing with scheme-relative URLs 2023-01-19 15:46:32 +01:00
Viktor Lofgren
321a9028c7 Merge branch 'release' 2023-01-11 19:46:33 +01:00
Viktor Lofgren
5851e91424 Clean-up and fix for feature regression in site:-terms 2023-01-11 19:33:32 +01:00
Viktor Lofgren
fb2797a8ef Tweaking search result valuation 2023-01-11 19:33:05 +01:00
Viktor Lofgren
085d985e61 Result selection algorithm tweaks 2023-01-11 17:19:57 +01:00
Viktor Lofgren
69ccf143ac New search profile for hardcore web 1.0 content. 2023-01-11 16:11:51 +01:00
Viktor Lofgren
4d3ef0e3b3 Tool for cleaning raw index files based on a predicate. 2023-01-11 16:11:29 +01:00
Viktor Lofgren
4928b2e00e Use a mapped file instead of allocating to save memory (#136)
Co-authored-by: vlofgren <vlofgren@gmail.com>
Co-authored-by: vlofgren <vlofgren@marginalia.nu>
Co-authored-by: Viktor Lofgren <vlofgren@marginalia.nu>
Reviewed-on: https://git.marginalia.nu/marginalia/marginalia.nu/pulls/136
2023-01-09 22:06:58 +01:00
Viktor Lofgren
cb408dd737 Fixes 2023-01-09 22:06:15 +01:00
Viktor Lofgren
4ec338d218 Merge branch 'release' 2023-01-09 20:21:33 +01:00
Viktor Lofgren
a9ddc328a6 Fixes from master (#135)
Co-authored-by: vlofgren <vlofgren@gmail.com>
Co-authored-by: vlofgren <vlofgren@marginalia.nu>
Co-authored-by: Viktor Lofgren <vlofgren@marginalia.nu>
Reviewed-on: https://git.marginalia.nu/marginalia/marginalia.nu/pulls/135
2023-01-09 18:47:04 +01:00
Viktor Lofgren
4ff68e8807 Merge branch 'release'
# Conflicts:
#	marginalia_nu/src/main/java/nu/marginalia/wmsa/edge/index/postings/forward/ForwardIndexConverter.java
2023-01-09 18:46:48 +01:00
Viktor Lofgren
11b0d61efc Fixes 2023-01-09 18:45:04 +01:00
Viktor Lofgren
998ebc80a1 Hotfixes (#134)
Co-authored-by: vlofgren <vlofgren@gmail.com>
Co-authored-by: vlofgren <vlofgren@marginalia.nu>
Co-authored-by: Viktor Lofgren <vlofgren@marginalia.nu>
Reviewed-on: https://git.marginalia.nu/marginalia/marginalia.nu/pulls/134
2023-01-09 18:23:19 +01:00
Viktor Lofgren
0fa1c8c16a Merge branch 'release'
# Conflicts:
#	marginalia_nu/src/main/java/nu/marginalia/wmsa/configuration/ServiceDescriptor.java
#	marginalia_nu/src/main/java/nu/marginalia/wmsa/edge/index/postings/forward/ForwardIndexConverter.java
2023-01-09 18:22:32 +01:00
Viktor Lofgren
0b6200705e Bugfix in forward converter, should force both files before exiting. Also don't need to create an intermediate file. 2023-01-09 16:57:58 +01:00
Viktor Lofgren
58cae7d963 Bugfix for logs. 2023-01-09 15:46:11 +01:00
Viktor Lofgren
6b44786649 2022-11 release (#133)
Co-authored-by: vlofgren <vlofgren@gmail.com>
Co-authored-by: vlofgren <vlofgren@marginalia.nu>
Co-authored-by: Viktor Lofgren <vlofgren@marginalia.nu>
Reviewed-on: https://git.marginalia.nu/marginalia/marginalia.nu/pulls/133
2023-01-08 11:13:39 +01:00
Viktor Lofgren
865275889e Merge branch 'release' into master 2023-01-08 11:13:25 +01:00
Viktor Lofgren
6d33c386fc Merge changes from experimental branch (#132)
Co-authored-by: Viktor Lofgren <vlofgren@marginalia.nu>
Reviewed-on: https://git.marginalia.nu/marginalia/marginalia.nu/pulls/132
2023-01-08 11:11:44 +01:00
Viktor Lofgren
06299cd554 Bugfix for rare bug where some queries may miss hits due to BTreeReader's retain function giving up too fast. (#129)
Co-authored-by: Viktor Lofgren <vlofgren@marginalia.nu>
Reviewed-on: https://git.marginalia.nu/marginalia/marginalia.nu/pulls/129
2022-11-22 16:35:09 +01:00
Viktor Lofgren
5d38d63d07 Merge branch 'release' into master 2022-11-22 16:34:43 +01:00
Viktor Lofgren
c057ce74a8 Bugfix for rare bug where some queries may miss hits due to BTreeReader's retain function giving up too fast. 2022-11-22 16:33:29 +01:00
Viktor Lofgren
f0f82f7db0 Reduce resource waste (#128) 2022-11-18 17:13:08 +01:00
Viktor Lofgren
baaf21911a Reduce resource usage waste in edge-search by recycling QueryVariants 2022-11-18 17:12:34 +01:00
Viktor Lofgren
a829be45cb Merge branch 'release' into master 2022-11-18 17:11:00 +01:00
Viktor Lofgren
e86f52d7d8 Reduce resource usage waste in edge-search by recycling QueryVariants 2022-11-18 17:09:07 +01:00
Viktor Lofgren
2f7b429217 Update 'README.md' 2022-11-12 16:01:48 +01:00
Viktor Lofgren
b1b880a48c Update 'CONTRIBUTING.md' 2022-11-12 14:12:59 +01:00
Viktor Lofgren
674af5449d Fix for intermittent API service 500's (#127)
Co-authored-by: vlofgren <vlofgren@gmail.com>
Co-authored-by: vlofgren <vlofgren@marginalia.nu>
Co-authored-by: Viktor Lofgren <vlofgren@marginalia.nu>
Reviewed-on: https://git.marginalia.nu/marginalia/marginalia.nu/pulls/127
2022-11-06 12:13:50 +01:00
Viktor Lofgren
e2ccbcc838 Merge branch 'release' into master 2022-11-06 12:13:20 +01:00
Viktor Lofgren
655504c1f0 Hotfix for NaN-serialization bug in API service. 2022-11-06 12:12:10 +01:00