Viktor Lofgren
df1850bd45
Fix bug in index service where tld: and links:-queries wouldn't work.
2023-04-15 18:39:16 +02:00
Viktor Lofgren
d42ab19166
Issue 5: Fix bug where some IPv6 addresses blew up domain loading.
2023-04-15 14:11:08 +02:00
Viktor Lofgren
2ab26f37b8
Bug fix for document metadata encoding that breaks year based queries.
2023-04-14 16:56:49 +02:00
Viktor
ec7ce7b0b3
Update readme.md
2023-04-11 16:31:11 +02:00
Viktor Lofgren
3e9b37c264
Refactor website screenshot tool and website adjacencies calculator into code/tools.
2023-04-11 16:20:27 +02:00
Viktor Lofgren
502713f7a8
Reduce memory churn
2023-04-10 16:51:17 +02:00
Viktor Lofgren
e19256a6b6
Tune settings to retrieve more results.
2023-04-10 15:39:20 +02:00
Viktor Lofgren
ccc41d1717
Clean up of the index query handling related code.
2023-04-10 14:50:57 +02:00
Viktor Lofgren
e49b1dd155
Better handling of quote terms, fix bug in handling of longer queries.
...
... where some terms may previously have been ignored. The latter bug was due to the handling of QueryHeads with AnyOf-style predicates interacting poorly with alreadyConsideredTerms in SearchIndex.java
2023-04-10 13:20:40 +02:00
Viktor Lofgren
fe419b12b4
Better handling of quote terms, fix bug in handling of longer queries.
...
... where some terms may previously have been ignored. The latter bug was due to the handling of QueryHeads with AnyOf-style predicates interacting poorly with alreadyConsideredTerms in SearchIndex.java
2023-04-10 13:11:40 +02:00
Viktor Lofgren
810515c08d
Clean up artifact extractor.
2023-04-10 13:07:54 +02:00
Viktor Lofgren
535a51a621
Repair broken year query test.
2023-04-08 12:04:09 +02:00
Viktor
a278fc6296
Increase search result relevance ( #8 )
...
* Increase accuracy of the position bits.
* Increase their width to 56.
* Use a rolling position scheme for bits 16-56 to increase the average accuracy.
* Result ranking overhaul
* Optimized queries
* BM25 in the index service's ranking
* Make gui less jank
* Javadocs for ranking parameters.
2023-04-07 20:18:08 +02:00
Viktor Lofgren
716ab35b4e
Search ranking debuggability improvements.
2023-04-02 13:43:24 +02:00
Viktor Lofgren
3fb249758e
Adjust result ordering.
2023-04-02 12:05:22 +02:00
Viktor Lofgren
f7a6ef2179
Smarter queries, better logging.
2023-04-02 12:05:09 +02:00
Viktor Lofgren
105d93cd85
Index query builder automatically ignores redundant predicates.
2023-04-02 12:04:26 +02:00
Viktor Lofgren
1e4157017d
More helpful descriptions of index queries.
2023-04-02 12:03:58 +02:00
Viktor Lofgren
5fb75adaae
Remove antique result scoring adjustment that makes no sense anymore.
2023-04-02 11:58:04 +02:00
Viktor Lofgren
affcf8cf41
Load test tool
2023-04-02 09:43:43 +02:00
Viktor Lofgren
cc4e089a5d
Consider average sentence length when selecting search results. This promotes proses over code listings, tabular data, etc.
2023-03-30 15:46:15 +02:00
Viktor Lofgren
32b9c2e671
Fix SentenceExtractor jank
2023-03-30 15:45:04 +02:00
Viktor Lofgren
4d05be4095
Refactor InternalLinkGraph
2023-03-30 15:44:23 +02:00
Viktor Lofgren
137adb9c3c
Bitmask calculation improvement. Take sentence length into consideration, not all lines are equal.
2023-03-30 15:42:06 +02:00
Viktor Lofgren
16e37672fc
Bugfix crawl plan, doesn't use rewrite() everywhere
2023-03-30 15:41:07 +02:00
Viktor Lofgren
d0c72ceb7e
Improve experiment runner, convenient start script.
2023-03-30 15:40:31 +02:00
Viktor Lofgren
0fcb2b534c
Polish Names
2023-03-29 16:51:47 +02:00
Viktor Lofgren
dcf6218cdb
Fix bugs related to search result selection in the case with multiple search terms.
...
* A deduplication filter step ran too early, and removed many good results on the basis that they partially, but did not fully fit another set of search terms.
* Altered the query creation process to prefer documents where multiple terms appear in the priority index.
2023-03-29 15:18:52 +02:00
Viktor Lofgren
8f51345a1d
Add experiment runner tool and got rid of experiments module in processes.
2023-03-28 16:58:46 +02:00
Viktor Lofgren
03bd892b95
Improve document processing in conversion.
...
* Add flags for long and short documents.
* Break out common length logic from plugins.
* Cleaning up of related code.
2023-03-28 16:38:00 +02:00
Viktor Lofgren
30584887f9
DictionaryMap changes.
...
Add new flag to change the default size to make prod index boot faster. Remove option to select OffHeapDictionaryHashMap.
2023-03-27 17:28:39 +02:00
Viktor Lofgren
17ca4f9eea
Permit search results that are all synthetic to pass relevancy check.
2023-03-27 17:27:35 +02:00
Viktor Lofgren
7fb3db3249
Fix bug where link on front page news listing wouldn't work.
...
... also changed order of date and source to make the UI more consistent.
2023-03-27 17:26:46 +02:00
Viktor Lofgren
862e925d7c
"-Dsmall-ram=TRUE" no longer does anything. Remove references to the flag, which previously reduced the memory footprint of the loader and index service.
2023-03-26 21:37:11 +02:00
Viktor Lofgren
a0027ad32b
Fix broken diagram links after doc/ restructuring.
2023-03-25 16:32:10 +01:00
Viktor Lofgren
c5f4cb34bf
Documentation for DB
2023-03-25 16:14:16 +01:00
Viktor
be3ba3ef37
Update readme.md
2023-03-25 15:27:11 +01:00
Viktor
ac1ac3ea57
Move database to a separate module
...
* Move database to a separate project, break apart sql file into separate entities.
* Fix front page news listing.
2023-03-25 15:26:17 +01:00
Viktor
0b505939ed
Update features-convert/readme.md
2023-03-25 12:43:58 +01:00
Viktor
d2a9e1b644
Add processes link to readme.md for code/common
2023-03-25 12:42:44 +01:00
Viktor Lofgren
3464ca514b
Fix typeahead suggestions
2023-03-25 10:20:52 +01:00
Viktor Lofgren
2f2c86a9f5
Fix bug where WmsaHome wouldn't look in /var/lib/wmsa as a fallback
2023-03-25 10:20:52 +01:00
Viktor
45dd9fea25
Update readme.md
2023-03-22 17:15:36 +01:00
Viktor
c974d72e7e
Update readme.md
2023-03-22 17:09:48 +01:00
Viktor
e3675d2fa9
Update readme.md
2023-03-22 17:02:03 +01:00
Viktor
c4a6bf7672
Update readme.md
2023-03-22 17:01:34 +01:00
Viktor
cb6865924e
Update readme.md
2023-03-22 16:59:38 +01:00
Viktor Lofgren
964014860a
Get suggestions working again
2023-03-22 15:11:22 +01:00
Viktor Lofgren
7c58ddce81
readme.md
2023-03-22 15:10:30 +01:00
Viktor Lofgren
611ba2d35a
Break apart WordPatterns class
2023-03-22 15:10:17 +01:00
Viktor
ecd6ed186f
Update readme.md
2023-03-21 17:33:02 +01:00
Viktor
b07f84bc01
Update readme.md
2023-03-21 17:32:09 +01:00
Viktor
ad2e939018
Update readme.md
2023-03-21 17:30:44 +01:00
Viktor
2a90ade80f
Update readme.md
2023-03-21 17:26:59 +01:00
Viktor
38fd49b271
Update readme.md
2023-03-21 17:11:28 +01:00
Viktor
1b9ae7b42d
Update readme.md
2023-03-21 16:38:39 +01:00
Viktor Lofgren
46f81aca2f
Break apart reverse index into a separate full index and priority index. It did this before using the same code. This will make the priority index about half as big since it no longer needs to keep metadata.
2023-03-21 16:12:31 +01:00
Viktor Lofgren
ca22c287a5
Make use of DocumentFlags' flags
2023-03-21 16:03:15 +01:00
Viktor Lofgren
1bb1248ab0
Optimize array library, jmh benchmarks.
2023-03-21 16:02:31 +01:00
Viktor Lofgren
624e8acd41
Remove copy-pasted application plugin from subprojects that define features.
2023-03-20 17:25:58 +01:00
vlofgren
29c76fcdce
Add page&brin to domain-ranking readme.md
2023-03-20 16:41:34 +01:00
vlofgren
55d0fa61d7
Update readme.md
2023-03-20 16:39:15 +01:00
vlofgren
554a7fde80
Update readme.md
2023-03-20 16:27:37 +01:00
Viktor Lofgren
72115e490f
Put news into a database table instead of keeping them hardcoded, request counter on front page.
2023-03-19 12:54:58 +01:00
Viktor Lofgren
bdd2b4a43e
Put news into a database table instead of keeping them hardcoded.
2023-03-19 11:46:13 +01:00
Viktor Lofgren
0682550bd2
Clean up summary extractor module.
2023-03-18 10:33:58 +01:00
Viktor Lofgren
6e89377dea
Clean up summary extractor module.
2023-03-18 10:29:25 +01:00
Viktor Lofgren
950c49d80f
Clean up summary extractor module.
2023-03-18 10:28:48 +01:00
Viktor Lofgren
8def95e849
Clean up summary extractor module.
2023-03-18 10:24:12 +01:00
Viktor Lofgren
43430728aa
Clean up summary extractor module.
2023-03-18 10:21:41 +01:00
Viktor Lofgren
6a20b2b678
Trivial reformatting of code.
2023-03-17 22:11:14 +01:00
Viktor Lofgren
3675c7a090
The search-service doesn't speak REST.
2023-03-17 16:21:52 +01:00
Viktor Lofgren
2eb972dea1
Remove unrelated code, break tools into their own directory.
2023-03-17 16:03:11 +01:00
Viktor Lofgren
449471a076
Yet more restructuring. Improved search result ranking.
2023-03-16 21:35:54 +01:00
Viktor Lofgren
5ef17a2a20
Yet more restructuring.
2023-03-13 23:43:09 +01:00
Viktor Lofgren
0ecab53635
Yet more restructuring.
2023-03-13 23:40:26 +01:00
Viktor Lofgren
d82532b7f1
More restructuring, big bug fixes in keyword extraction.
2023-03-13 17:39:53 +01:00
Viktor Lofgren
281f1322a9
Clean up BTreeWriter
2023-03-12 12:49:49 +01:00
Viktor Lofgren
8b8fc49901
The refactoring will continue until morale improves.
2023-03-12 11:42:07 +01:00
Viktor Lofgren
73eaa0865d
The refactoring will continue until morale improves.
2023-03-12 10:50:31 +01:00
Viktor Lofgren
616effdb3c
The refactoring will continue until morale improves.
2023-03-12 10:04:48 +01:00
Viktor Lofgren
4cec89da91
Fix bug where results would sometimes be presented solely based on the fact that the document is important on the site in general, regardless of whether it's important to the document.
2023-03-11 14:20:32 +01:00
Viktor Lofgren
2e2916cebe
Additional code restructuring to get rid of util and misc-style packages.
2023-03-11 13:53:36 +01:00
Viktor Lofgren
6d939175b1
Additional code restructuring to get rid of util and misc-style packages.
2023-03-11 13:48:40 +01:00
Viktor Lofgren
73e412ea5b
Clean up search-service and index-api
2023-03-11 12:26:12 +01:00
Viktor Lofgren
0532e8c40e
Tidy up.
2023-03-11 11:35:08 +01:00
Viktor Lofgren
919b80b9ab
Gradle shouldn't generate dist zips, zipping jar files is slow and also just ridiculous when you realize jar files are zip files and you can't compress a file twice using the same algo.
2023-03-11 11:34:51 +01:00
Viktor Lofgren
a62015d5f3
Fix broken test, compiler warning.
2023-03-10 17:12:12 +01:00
Viktor Lofgren
722ff3bffb
Word feature bit for words that appear in the URL, new search profile for plain text files, better plain text titles.
2023-03-10 16:46:56 +01:00
Viktor Lofgren
2bc212d65c
Refactor DocumentKeyword-related classes
2023-03-09 20:41:38 +01:00
Viktor Lofgren
efb46cc703
Remove count from WordMetadata entirely.
2023-03-09 18:14:14 +01:00
Viktor Lofgren
8fb531c614
Word Metadata's count is hella broken, stopgap fix by bitCounting positions instead as this is messing with the search result ordering very badly.
2023-03-09 17:58:56 +01:00
Viktor Lofgren
9ece07d559
Chasing a result ranking bug
2023-03-09 17:52:35 +01:00
Viktor Lofgren
0ae4731cf1
Add invariant to WordMetadata
2023-03-09 17:27:07 +01:00
Viktor Lofgren
2a25b5e8a9
Placeholder screenshots when the domain is missing from the database entirely.
2023-03-08 18:36:41 +01:00
Viktor Lofgren
d4010c76cf
Better title extraction for plain text plugin.
2023-03-07 21:53:44 +01:00
Viktor Lofgren
6fb0f77eea
Improving search result scoring in index.
2023-03-07 21:53:30 +01:00
Viktor Lofgren
1252f95da5
Fix for valuation bug in index code that wouldn't sort bad-ish items properly.
2023-03-07 21:26:04 +01:00
Viktor Lofgren
f3babde415
Readme for code/
2023-03-07 17:32:16 +01:00
Viktor Lofgren
ad1be7c835
Move all code to a code directory.
2023-03-07 17:14:32 +01:00