Commit Graph

944 Commits

Author SHA1 Message Date
Viktor Lofgren
2afbdc2269 Adjust the logic for the crawl job extractor to set a relatively low visit limit for websites that are new in the index or has not yielded many good documents previously. 2023-06-07 22:01:35 +02:00
Viktor Lofgren
d82a858491 Don't consider slash to be a sentence separator. 2023-05-31 16:54:30 +02:00
Viktor Lofgren
e332faa07e Fix test that broke when memex.marginalia.nu started redirecting to www.marginalia.nu. 2023-05-28 13:46:24 +02:00
Viktor Lofgren
4e9e79454f Fix broken transformation functions in the PagingArray classes. 2023-05-28 13:31:05 +02:00
Viktor Lofgren
b0bc07b4e7 Insertion sort was *super* busted I don't even know how it worked 2023-05-28 12:17:50 +02:00
Viktor Lofgren
2cda57355a More word metadata tests 2023-05-28 11:57:06 +02:00
Viktor Lofgren
fd192d2791 Fix putative overflow error with a large dictionary 2023-05-28 11:57:06 +02:00
Viktor Lofgren
6814c90625 Fix N-width sorting bug 2023-05-28 11:57:06 +02:00
Viktor
a57ab427b3
Update useful-resources.md 2023-05-27 12:01:45 +02:00
Viktor Lofgren
1e184a8372 (search) Make exploration mode more random 2023-05-25 17:40:28 +02:00
Viktor Lofgren
6fae51a8ef Stopgap fix for a bug in dealing with quote terms containing stop words. 2023-05-02 19:38:59 +02:00
Viktor Lofgren
a9f7b4c457 Add synthetic keywords for same-site files linked from a document (e.g. file:png). Also add category keywords, like file:image or file:document. 2023-04-30 19:29:13 +02:00
Viktor Lofgren
1e3b6934bb Reduce log noise during loading. Bad URLs don't need to be loaded, they can be grepped from the instructions. 2023-04-30 18:36:44 +02:00
Viktor
0a5e85be8f
Update README.md 2023-04-22 21:02:25 +02:00
Viktor
7694a15f62
Fix kale's unreasonably high weighting factor 2023-04-22 20:55:09 +02:00
Viktor
d72da01a92
Update readme.md 2023-04-22 16:05:57 +02:00
Viktor
112f43b3a1
Api service response cache (#16)
* Add response caching to the API service to help SearXNG

* Clean up the code a bit.

* Add an endpoint without a terminating slash for getLicense.

* Add tests for API service.
2023-04-22 15:42:32 +02:00
Viktor Lofgren
f12c6fd57e Add a ranking parameter for biasing toward recent or old content. 2023-04-20 16:00:59 +02:00
Viktor
96bac70b85
Tools for merging sorted lists, and merging btrees. (#14)
* Utilities for merging BTrees of entity size 1 and 2.
* Isolate and clean up sorting algorithms. 
* Functions for keeping distinct items in a LongArray
2023-04-20 15:28:09 +02:00
Viktor Lofgren
619fb8ba80 (converter) Adjust the pub-date sniffing heuristics' order. Doing HTML5 tags too early puts some sites too early. Also expanded support for JSON+LD. 2023-04-19 15:28:50 +02:00
Viktor
5a5cdaf70e
Improvements to the adjacency calculator and screenshots tool (#13)
* WIP: Improvements to website adjacencies loader tool.

* Improving screenshots capture bot.
2023-04-18 22:21:49 +02:00
Viktor Lofgren
bb587ca47f Reformulate search-header.hdb, s/Support/Donate/ the formulation was apparently confusing some people thinking they could get support on this page. 2023-04-18 17:04:24 +02:00
Viktor Lofgren
4d298cd5fa Improving screenshots capture bot. 2023-04-17 18:04:22 +02:00
Viktor Lofgren
fbbaf584ba Adjustments to screenshot capture tool. 2023-04-16 08:55:57 +02:00
Viktor Lofgren
df1850bd45 Fix bug in index service where tld: and links:-queries wouldn't work. 2023-04-15 18:39:16 +02:00
Viktor Lofgren
d42ab19166 Issue 5: Fix bug where some IPv6 addresses blew up domain loading. 2023-04-15 14:11:08 +02:00
Viktor Lofgren
2ab26f37b8 Bug fix for document metadata encoding that breaks year based queries. 2023-04-14 16:56:49 +02:00
Viktor
ec7ce7b0b3
Update readme.md 2023-04-11 16:31:11 +02:00
Viktor Lofgren
3e9b37c264 Refactor website screenshot tool and website adjacencies calculator into code/tools. 2023-04-11 16:20:27 +02:00
Viktor Lofgren
502713f7a8 Reduce memory churn 2023-04-10 16:51:17 +02:00
Viktor Lofgren
e19256a6b6 Tune settings to retrieve more results. 2023-04-10 15:39:20 +02:00
Viktor Lofgren
ccc41d1717 Clean up of the index query handling related code. 2023-04-10 14:50:57 +02:00
Viktor Lofgren
e49b1dd155 Better handling of quote terms, fix bug in handling of longer queries.
... where some terms may previously have been ignored. The latter bug was due to the handling of QueryHeads with AnyOf-style predicates interacting poorly with alreadyConsideredTerms in SearchIndex.java
2023-04-10 13:20:40 +02:00
Viktor Lofgren
fe419b12b4 Better handling of quote terms, fix bug in handling of longer queries.
... where some terms may previously have been ignored. The latter bug was due to the handling of QueryHeads with AnyOf-style predicates interacting poorly with alreadyConsideredTerms in SearchIndex.java
2023-04-10 13:11:40 +02:00
Viktor Lofgren
810515c08d Clean up artifact extractor. 2023-04-10 13:07:54 +02:00
Viktor Lofgren
535a51a621 Repair broken year query test. 2023-04-08 12:04:09 +02:00
Viktor
a278fc6296
Increase search result relevance (#8)
* Increase accuracy of the position bits.
* Increase their width to 56.
* Use a rolling position scheme for bits 16-56 to increase the average accuracy.
* Result ranking overhaul
* Optimized queries
* BM25 in the index service's ranking
* Make gui less jank
* Javadocs for ranking parameters.
2023-04-07 20:18:08 +02:00
Viktor
f1c6525a50
Update setup.sh 2023-04-02 14:44:43 +02:00
Viktor
ace0d19973
Update README.md 2023-04-02 14:43:14 +02:00
Viktor
40b8c8c128
Update README.md 2023-04-02 14:08:17 +02:00
Viktor Lofgren
716ab35b4e Search ranking debuggability improvements. 2023-04-02 13:43:24 +02:00
Viktor Lofgren
3fb249758e Adjust result ordering. 2023-04-02 12:05:22 +02:00
Viktor Lofgren
f7a6ef2179 Smarter queries, better logging. 2023-04-02 12:05:09 +02:00
Viktor Lofgren
105d93cd85 Index query builder automatically ignores redundant predicates. 2023-04-02 12:04:26 +02:00
Viktor Lofgren
1e4157017d More helpful descriptions of index queries. 2023-04-02 12:03:58 +02:00
Viktor Lofgren
5fb75adaae Remove antique result scoring adjustment that makes no sense anymore. 2023-04-02 11:58:04 +02:00
Viktor Lofgren
affcf8cf41 Load test tool 2023-04-02 09:43:43 +02:00
Viktor Lofgren
cc4e089a5d Consider average sentence length when selecting search results. This promotes proses over code listings, tabular data, etc. 2023-03-30 15:46:15 +02:00
Viktor Lofgren
32b9c2e671 Fix SentenceExtractor jank 2023-03-30 15:45:04 +02:00
Viktor Lofgren
4d05be4095 Refactor InternalLinkGraph 2023-03-30 15:44:23 +02:00