Viktor Lofgren
1e6800565a
(system) Remove EdgeId<T> and similar objects
...
They seemed like a good idea at the time, but in practice they're wasting resources and not really providing the clarity I had hoped.
2023-08-24 17:46:02 +02:00
Viktor Lofgren
9894f37412
(index) Implement new URL ID coding scheme.
...
Also refactor along the way. Really needs an additional pass, these tests are very hairy.
2023-08-24 16:44:27 +02:00
Viktor Lofgren
6a04cdfddf
(loader) Implement new linkdb in loader
...
Deprecate the LoadUrl instruction entirely. We no longer need to be told upfront about which URLs to expect, as IDs are generated from the domain id and document ordinal.
For now, we no longer store new URLs in different domains. We need to re-implement this somehow, probably in a different job or a as a different output.
2023-08-24 13:07:54 +02:00
Viktor Lofgren
ebc84c22fb
Upgrade antique lombok plugin
...
This permits tests to run on JDK20 environments.
2023-08-23 14:34:32 +00:00
Viktor Lofgren
aa0d256d6a
Upgrade code to Java 20.
...
* Change language version
* Upgrade Lombok to a JDK20 compatible version
2023-08-23 13:37:49 +00:00
Viktor Lofgren
1a05cba60a
(keyword lexicon) Use three hash tables to increase the possible number of keywords to 2^31 from 0.75 x 2^30.
2023-08-23 11:25:20 +02:00
Viktor Lofgren
704de50a9b
(forward-index, valuator) HTML features in valuator
...
Put it in the forward index for easy access during index-side valuation.
2023-08-18 11:54:56 +02:00
Viktor Lofgren
251fc63b42
(*) Fix merge gore
2023-08-09 13:33:28 +02:00
Viktor Lofgren
624b78ec3a
(heartbeat) Task heartbeats
2023-08-04 14:40:06 +02:00
Viktor Lofgren
b08e302dd5
(lexicon) Optimize lexicon by using Murmur3_128's hash function
2023-08-01 15:02:13 +02:00
Viktor Lofgren
ea66195b97
(loader) Optimize loader by using zstd's direct streaming writer and the Murmur3_128 string hash
2023-08-01 15:02:13 +02:00
Viktor Lofgren
9288d311d4
Add buffering to index journal writer
2023-07-28 18:11:19 +02:00
Viktor Lofgren
d7ab21fe34
(*) Refactor Control Service and processes
2023-07-17 21:20:31 +02:00
Viktor Lofgren
88b9ec70c6
(control, WIP) Run reconvert-load from converter :D
2023-07-11 18:05:37 +02:00
Viktor Lofgren
55c65f0935
Use document generator to complement the document selection.
...
Will let through e.g. a modern SSG in the small web filter.
2023-06-22 17:21:33 +02:00
Viktor Lofgren
ccc41d1717
Clean up of the index query handling related code.
2023-04-10 14:50:57 +02:00
Viktor Lofgren
e49b1dd155
Better handling of quote terms, fix bug in handling of longer queries.
...
... where some terms may previously have been ignored. The latter bug was due to the handling of QueryHeads with AnyOf-style predicates interacting poorly with alreadyConsideredTerms in SearchIndex.java
2023-04-10 13:20:40 +02:00
Viktor
a278fc6296
Increase search result relevance ( #8 )
...
* Increase accuracy of the position bits.
* Increase their width to 56.
* Use a rolling position scheme for bits 16-56 to increase the average accuracy.
* Result ranking overhaul
* Optimized queries
* BM25 in the index service's ranking
* Make gui less jank
* Javadocs for ranking parameters.
2023-04-07 20:18:08 +02:00
Viktor Lofgren
105d93cd85
Index query builder automatically ignores redundant predicates.
2023-04-02 12:04:26 +02:00
Viktor Lofgren
1e4157017d
More helpful descriptions of index queries.
2023-04-02 12:03:58 +02:00
Viktor Lofgren
dcf6218cdb
Fix bugs related to search result selection in the case with multiple search terms.
...
* A deduplication filter step ran too early, and removed many good results on the basis that they partially, but did not fully fit another set of search terms.
* Altered the query creation process to prefer documents where multiple terms appear in the priority index.
2023-03-29 15:18:52 +02:00
Viktor Lofgren
30584887f9
DictionaryMap changes.
...
Add new flag to change the default size to make prod index boot faster. Remove option to select OffHeapDictionaryHashMap.
2023-03-27 17:28:39 +02:00
Viktor
ac1ac3ea57
Move database to a separate module
...
* Move database to a separate project, break apart sql file into separate entities.
* Fix front page news listing.
2023-03-25 15:26:17 +01:00
Viktor
45dd9fea25
Update readme.md
2023-03-22 17:15:36 +01:00
Viktor
c974d72e7e
Update readme.md
2023-03-22 17:09:48 +01:00
Viktor
ecd6ed186f
Update readme.md
2023-03-21 17:33:02 +01:00
Viktor
b07f84bc01
Update readme.md
2023-03-21 17:32:09 +01:00
Viktor Lofgren
46f81aca2f
Break apart reverse index into a separate full index and priority index. It did this before using the same code. This will make the priority index about half as big since it no longer needs to keep metadata.
2023-03-21 16:12:31 +01:00
vlofgren
29c76fcdce
Add page&brin to domain-ranking readme.md
2023-03-20 16:41:34 +01:00
vlofgren
554a7fde80
Update readme.md
2023-03-20 16:27:37 +01:00
Viktor Lofgren
2eb972dea1
Remove unrelated code, break tools into their own directory.
2023-03-17 16:03:11 +01:00
Viktor Lofgren
449471a076
Yet more restructuring. Improved search result ranking.
2023-03-16 21:35:54 +01:00
Viktor Lofgren
0ecab53635
Yet more restructuring.
2023-03-13 23:40:26 +01:00
Viktor Lofgren
d82532b7f1
More restructuring, big bug fixes in keyword extraction.
2023-03-13 17:39:53 +01:00
Viktor Lofgren
8b8fc49901
The refactoring will continue until morale improves.
2023-03-12 11:42:07 +01:00
Viktor Lofgren
73eaa0865d
The refactoring will continue until morale improves.
2023-03-12 10:50:31 +01:00