Viktor Lofgren
df1850bd45
Fix bug in index service where tld: and links:-queries wouldn't work.
2023-04-15 18:39:16 +02:00
Viktor Lofgren
d42ab19166
Issue 5: Fix bug where some IPv6 addresses blew up domain loading.
2023-04-15 14:11:08 +02:00
Viktor Lofgren
2ab26f37b8
Bug fix for document metadata encoding that breaks year based queries.
2023-04-14 16:56:49 +02:00
Viktor
ec7ce7b0b3
Update readme.md
2023-04-11 16:31:11 +02:00
Viktor Lofgren
3e9b37c264
Refactor website screenshot tool and website adjacencies calculator into code/tools.
2023-04-11 16:20:27 +02:00
Viktor Lofgren
502713f7a8
Reduce memory churn
2023-04-10 16:51:17 +02:00
Viktor Lofgren
e19256a6b6
Tune settings to retrieve more results.
2023-04-10 15:39:20 +02:00
Viktor Lofgren
ccc41d1717
Clean up of the index query handling related code.
2023-04-10 14:50:57 +02:00
Viktor Lofgren
e49b1dd155
Better handling of quote terms, fix bug in handling of longer queries.
...
... where some terms may previously have been ignored. The latter bug was due to the handling of QueryHeads with AnyOf-style predicates interacting poorly with alreadyConsideredTerms in SearchIndex.java
2023-04-10 13:20:40 +02:00
Viktor Lofgren
fe419b12b4
Better handling of quote terms, fix bug in handling of longer queries.
...
... where some terms may previously have been ignored. The latter bug was due to the handling of QueryHeads with AnyOf-style predicates interacting poorly with alreadyConsideredTerms in SearchIndex.java
2023-04-10 13:11:40 +02:00
Viktor Lofgren
810515c08d
Clean up artifact extractor.
2023-04-10 13:07:54 +02:00
Viktor Lofgren
535a51a621
Repair broken year query test.
2023-04-08 12:04:09 +02:00
Viktor
a278fc6296
Increase search result relevance ( #8 )
...
* Increase accuracy of the position bits.
* Increase their width to 56.
* Use a rolling position scheme for bits 16-56 to increase the average accuracy.
* Result ranking overhaul
* Optimized queries
* BM25 in the index service's ranking
* Make gui less jank
* Javadocs for ranking parameters.
2023-04-07 20:18:08 +02:00
Viktor
f1c6525a50
Update setup.sh
2023-04-02 14:44:43 +02:00
Viktor
ace0d19973
Update README.md
2023-04-02 14:43:14 +02:00
Viktor
40b8c8c128
Update README.md
2023-04-02 14:08:17 +02:00
Viktor Lofgren
716ab35b4e
Search ranking debuggability improvements.
2023-04-02 13:43:24 +02:00
Viktor Lofgren
3fb249758e
Adjust result ordering.
2023-04-02 12:05:22 +02:00
Viktor Lofgren
f7a6ef2179
Smarter queries, better logging.
2023-04-02 12:05:09 +02:00
Viktor Lofgren
105d93cd85
Index query builder automatically ignores redundant predicates.
2023-04-02 12:04:26 +02:00
Viktor Lofgren
1e4157017d
More helpful descriptions of index queries.
2023-04-02 12:03:58 +02:00
Viktor Lofgren
5fb75adaae
Remove antique result scoring adjustment that makes no sense anymore.
2023-04-02 11:58:04 +02:00
Viktor Lofgren
affcf8cf41
Load test tool
2023-04-02 09:43:43 +02:00
Viktor Lofgren
cc4e089a5d
Consider average sentence length when selecting search results. This promotes proses over code listings, tabular data, etc.
2023-03-30 15:46:15 +02:00
Viktor Lofgren
32b9c2e671
Fix SentenceExtractor jank
2023-03-30 15:45:04 +02:00
Viktor Lofgren
4d05be4095
Refactor InternalLinkGraph
2023-03-30 15:44:23 +02:00
Viktor Lofgren
137adb9c3c
Bitmask calculation improvement. Take sentence length into consideration, not all lines are equal.
2023-03-30 15:42:06 +02:00
Viktor Lofgren
16e37672fc
Bugfix crawl plan, doesn't use rewrite() everywhere
2023-03-30 15:41:07 +02:00
Viktor Lofgren
d0c72ceb7e
Improve experiment runner, convenient start script.
2023-03-30 15:40:31 +02:00
Viktor Lofgren
0fcb2b534c
Polish Names
2023-03-29 16:51:47 +02:00
Viktor Lofgren
dcf6218cdb
Fix bugs related to search result selection in the case with multiple search terms.
...
* A deduplication filter step ran too early, and removed many good results on the basis that they partially, but did not fully fit another set of search terms.
* Altered the query creation process to prefer documents where multiple terms appear in the priority index.
2023-03-29 15:18:52 +02:00
Viktor Lofgren
8f51345a1d
Add experiment runner tool and got rid of experiments module in processes.
2023-03-28 16:58:46 +02:00
Viktor Lofgren
03bd892b95
Improve document processing in conversion.
...
* Add flags for long and short documents.
* Break out common length logic from plugins.
* Cleaning up of related code.
2023-03-28 16:38:00 +02:00
Viktor Lofgren
1e65ac3940
Improve useful-resources.md
2023-03-28 16:35:58 +02:00
Viktor
e622437560
Create FUNDING.yml
2023-03-28 13:13:49 +02:00
Viktor Lofgren
30584887f9
DictionaryMap changes.
...
Add new flag to change the default size to make prod index boot faster. Remove option to select OffHeapDictionaryHashMap.
2023-03-27 17:28:39 +02:00
Viktor Lofgren
17ca4f9eea
Permit search results that are all synthetic to pass relevancy check.
2023-03-27 17:27:35 +02:00
Viktor Lofgren
7fb3db3249
Fix bug where link on front page news listing wouldn't work.
...
... also changed order of date and source to make the UI more consistent.
2023-03-27 17:26:46 +02:00
Viktor Lofgren
b60fcd0918
Documentation improvements
2023-03-27 17:25:27 +02:00
Viktor Lofgren
862e925d7c
"-Dsmall-ram=TRUE" no longer does anything. Remove references to the flag, which previously reduced the memory footprint of the loader and index service.
2023-03-26 21:37:11 +02:00
Viktor Lofgren
a0027ad32b
Fix broken diagram links after doc/ restructuring.
2023-03-25 16:32:10 +01:00
Viktor Lofgren
c5f4cb34bf
Documentation for DB
2023-03-25 16:14:16 +01:00
Viktor
2e69179f12
Update readme.md
2023-03-25 15:47:45 +01:00
Viktor
19000ab339
Create readme.md
2023-03-25 15:46:19 +01:00
Viktor
be3ba3ef37
Update readme.md
2023-03-25 15:27:11 +01:00
Viktor
ac1ac3ea57
Move database to a separate module
...
* Move database to a separate project, break apart sql file into separate entities.
* Fix front page news listing.
2023-03-25 15:26:17 +01:00
Viktor
0b505939ed
Update features-convert/readme.md
2023-03-25 12:43:58 +01:00
Viktor
d2a9e1b644
Add processes link to readme.md for code/common
2023-03-25 12:42:44 +01:00
Viktor Lofgren
3464ca514b
Fix typeahead suggestions
2023-03-25 10:20:52 +01:00
Viktor Lofgren
2f2c86a9f5
Fix bug where WmsaHome wouldn't look in /var/lib/wmsa as a fallback
2023-03-25 10:20:52 +01:00