Commit Graph

526 Commits

Author SHA1 Message Date
Viktor Lofgren
1b776b114e Restructuring the git repo 2023-03-04 14:00:46 +01:00
Viktor Lofgren
4fdaaa16ba Restructuring the git repo 2023-03-04 13:19:01 +01:00
Viktor Lofgren
9c665bbc74 Make the random website button look less weird on mobile. 2023-02-19 18:59:40 +01:00
Viktor Lofgren
b5805063e0 Simple implementation of a locality-sensitive hash for text word. 2023-02-19 18:53:17 +01:00
Viktor Lofgren
ff30de7352 Fix crawler bug that caused sites to fail to index when no paths were provided. 2023-02-13 20:26:08 +01:00
Viktor Lofgren
b348dbb00e Add parameters to the ranking and search set configurations. 2023-02-13 17:21:28 +01:00
Viktor Lofgren
fbd2d29d78 Merge branch 'release'
# Conflicts:
#	marginalia_nu/src/main/java/nu/marginalia/wmsa/edge/index/svc/EdgeIndexSearchSetsService.java
2023-02-13 17:08:25 +01:00
Viktor Lofgren
d6b02f6669 Add parameters to the ranking and search set configurations. 2023-02-13 17:07:33 +01:00
Viktor Lofgren
b92d18521d Search UI form update, and footer typo fix. 2023-02-13 17:01:49 +01:00
Viktor Lofgren
0e7d3672d8 Minor ranking set tweak (#140)
Co-authored-by: Viktor Lofgren <vlofgren@marginalia.nu>
Reviewed-on: https://git.marginalia.nu/marginalia/marginalia.nu/pulls/140
2023-02-12 20:42:48 +01:00
Viktor Lofgren
1314702e4f Merge branch 'release' 2023-02-12 20:41:52 +01:00
Viktor Lofgren
6bb4528a9e Tweak the ranking parameters a bit 2023-02-12 20:41:23 +01:00
Viktor Lofgren
a9f7b8223e Bug fix in site document listing (#139)
Co-authored-by: Viktor Lofgren <vlofgren@marginalia.nu>
Reviewed-on: https://git.marginalia.nu/marginalia/marginalia.nu/pulls/139
2023-02-12 19:10:18 +01:00
Viktor Lofgren
fae3a025b8 Merge branch 'release' 2023-02-12 19:09:30 +01:00
Viktor Lofgren
61f7ce8ca5 Bug fix site listing 2023-02-12 15:28:50 +01:00
Viktor Lofgren
fa9b4e4352 A tiny release between crawls (#138)
Bringing online new ranking changes

Co-authored-by: Viktor Lofgren <vlofgren@marginalia.nu>
Reviewed-on: https://git.marginalia.nu/marginalia/marginalia.nu/pulls/138
2023-02-12 10:57:07 +01:00
Viktor Lofgren
6ef9f13c68 merge release into master 2023-02-12 10:53:51 +01:00
Viktor Lofgren
db50ca2231 Tidy up RankingSearchSet 2023-02-12 10:47:46 +01:00
Viktor Lofgren
4d9dce5733 Tidy up RankingSearchSet 2023-02-12 10:45:35 +01:00
Viktor Lofgren
bcadfc965d Use new cosine-similarity ranking algorithm 2023-02-12 10:28:53 +01:00
Viktor Lofgren
3e1297064c Tidy up code 2023-02-11 13:06:40 +01:00
Viktor Lofgren
06df8e9a28 Sort the index on rank to, like the previous design, prioritize the discovery of high ranking items. 2023-02-11 12:17:30 +01:00
Viktor Lofgren
e963ecb4ae Modified the ranking algorithm to be able to pagerank with similarity data instead of the link graph. 2023-02-07 22:13:25 +01:00
Viktor Lofgren
04f905f3a1 Reintroduce the ability to filter search results by their ranking. 2023-02-04 12:59:24 +01:00
Viktor Lofgren
4a07eda61c Debug query strategy options 2023-02-02 10:35:55 +01:00
Viktor Lofgren
b18cd0bc36 Improvements to array library and conversion 2023-02-02 10:35:14 +01:00
Viktor Lofgren
cdaeb7724a Clean up braille punch cards 2023-02-02 10:34:17 +01:00
Viktor Lofgren
e3bea19d4d Improvements to array library 2023-02-02 10:33:16 +01:00
Viktor Lofgren
8168d512b8 Retire defunct SMHI weather forecast integration. 2023-01-30 13:25:41 +01:00
Viktor Lofgren
4c2f54593e Use on-heap dictionary for small data. 2023-01-30 13:10:56 +01:00
Viktor Lofgren
4a6a1308b0 Remove min length regex, the guard is too weak to be meaningful 2023-01-30 10:43:53 +01:00
Viktor Lofgren
2e4532ca90 Clean up KeywordMetadata 2023-01-30 10:22:43 +01:00
Viktor Lofgren
d5df3268b3 Update 3rd party readme 2023-01-30 10:22:28 +01:00
Viktor Lofgren
9320a457a5 Misc tweaks and cleanups 2023-01-30 09:44:09 +01:00
Viktor Lofgren
1dac4e7e67 Override defaults in GSON 2023-01-30 09:43:21 +01:00
Viktor Lofgren
65b0ff26fc Better SiteWords extraction 2023-01-30 09:42:46 +01:00
Viktor Lofgren
5558af148e Reduce memory churn in KeywordCounter 2023-01-30 09:42:27 +01:00
Viktor Lofgren
8349435ef4 Better subject extraction and remove unnecessary calculation from DocumentKeywordExtractor 2023-01-30 09:41:54 +01:00
Viktor Lofgren
4d0b444703 String deduplication 2023-01-30 09:40:29 +01:00
Viktor Lofgren
0fd21b9cbf Reduce memory churn through BufferedReader via CrawledDomainReader 2023-01-30 09:39:16 +01:00
Viktor Lofgren
1b53a5389d Remove poorly guarded regex in UrlBlocklist 2023-01-30 09:37:37 +01:00
Viktor Lofgren
28214ad770 Remove unnecessary toLowerCase in isStopWord 2023-01-30 09:37:15 +01:00
Viktor Lofgren
dfd652a8d5 Make WordRep behave consistently across compareTo/equals 2023-01-30 09:36:47 +01:00
Viktor Lofgren
50862a2081 Refactor sentence extractor to break it apart into more readable chunks 2023-01-30 09:36:11 +01:00
Viktor Lofgren
ed728b2680 Compressed string component 2023-01-30 09:33:04 +01:00
Viktor Lofgren
728931c135 Compressed string component 2023-01-30 09:29:14 +01:00
Viktor Lofgren
1f646e4f68 Reduce memory churn in RDRPOSTagger 2023-01-30 09:25:57 +01:00
Viktor Lofgren
618582dc74 Performance optimizations in EdgeDomain's parsing, reduce the number of unguarded regular expressions 2023-01-30 09:23:11 +01:00
Viktor Lofgren
4854f40447 Array library optimizations for sortLargeSpan 2023-01-30 09:22:10 +01:00
Viktor Lofgren
c8f7a8cb69 Fix bug in dealing with scheme-relative URLs 2023-01-19 15:46:32 +01:00