Viktor
cbbf60a599
Better fingerprinting ( #35 )
...
* Better fingerprinting for server tech
* Many more features in FeatureExtractor
* Blog specialization
* SiteType table
2023-07-10 18:58:43 +02:00
Viktor Lofgren
96eecc6ea5
Minor: Readability.
2023-07-10 18:58:43 +02:00
Viktor Lofgren
d9e6c4f266
Trial integration of MQ-FSM into index service.
2023-07-06 18:04:16 +02:00
Viktor Lofgren
62cc9df206
Embryo of new control process
...
* New events and heartbeat tables in mariadb
* Refactored to a cleaner Service interface
2023-07-03 10:40:32 +02:00
Viktor Lofgren
0f34beb1aa
Update search front page
2023-06-29 17:14:27 +02:00
Viktor Lofgren
a6a66c6d8a
Improve site info for unknown domains:
...
* Placeholder screenshot should work
* Add a link to git-repo for submitting the site for crawling
2023-06-27 15:32:11 +02:00
Viktor Lofgren
d86e8522e2
Add search profiles for wiki, forum and docs.
2023-06-24 12:17:35 +02:00
Viktor Lofgren
bd2c3855ed
Add bits and keywords for generator classes (docs, forum, wiki).
2023-06-23 21:35:28 +02:00
Viktor Lofgren
55c65f0935
Use document generator to complement the document selection.
...
Will let through e.g. a modern SSG in the small web filter.
2023-06-22 17:21:33 +02:00
Viktor Lofgren
fd192d2791
Fix putative overflow error with a large dictionary
2023-05-28 11:57:06 +02:00
Viktor Lofgren
1e184a8372
(search) Make exploration mode more random
2023-05-25 17:40:28 +02:00
Viktor Lofgren
6fae51a8ef
Stopgap fix for a bug in dealing with quote terms containing stop words.
2023-05-02 19:38:59 +02:00
Viktor Lofgren
bb587ca47f
Reformulate search-header.hdb, s/Support/Donate/ the formulation was apparently confusing some people thinking they could get support on this page.
2023-04-18 17:04:24 +02:00
Viktor Lofgren
df1850bd45
Fix bug in index service where tld: and links:-queries wouldn't work.
2023-04-15 18:39:16 +02:00
Viktor Lofgren
502713f7a8
Reduce memory churn
2023-04-10 16:51:17 +02:00
Viktor Lofgren
e19256a6b6
Tune settings to retrieve more results.
2023-04-10 15:39:20 +02:00
Viktor Lofgren
ccc41d1717
Clean up of the index query handling related code.
2023-04-10 14:50:57 +02:00
Viktor Lofgren
e49b1dd155
Better handling of quote terms, fix bug in handling of longer queries.
...
... where some terms may previously have been ignored. The latter bug was due to the handling of QueryHeads with AnyOf-style predicates interacting poorly with alreadyConsideredTerms in SearchIndex.java
2023-04-10 13:20:40 +02:00
Viktor Lofgren
fe419b12b4
Better handling of quote terms, fix bug in handling of longer queries.
...
... where some terms may previously have been ignored. The latter bug was due to the handling of QueryHeads with AnyOf-style predicates interacting poorly with alreadyConsideredTerms in SearchIndex.java
2023-04-10 13:11:40 +02:00
Viktor Lofgren
535a51a621
Repair broken year query test.
2023-04-08 12:04:09 +02:00
Viktor
a278fc6296
Increase search result relevance ( #8 )
...
* Increase accuracy of the position bits.
* Increase their width to 56.
* Use a rolling position scheme for bits 16-56 to increase the average accuracy.
* Result ranking overhaul
* Optimized queries
* BM25 in the index service's ranking
* Make gui less jank
* Javadocs for ranking parameters.
2023-04-07 20:18:08 +02:00
Viktor Lofgren
716ab35b4e
Search ranking debuggability improvements.
2023-04-02 13:43:24 +02:00
Viktor Lofgren
3fb249758e
Adjust result ordering.
2023-04-02 12:05:22 +02:00
Viktor Lofgren
f7a6ef2179
Smarter queries, better logging.
2023-04-02 12:05:09 +02:00
Viktor Lofgren
105d93cd85
Index query builder automatically ignores redundant predicates.
2023-04-02 12:04:26 +02:00
Viktor Lofgren
1e4157017d
More helpful descriptions of index queries.
2023-04-02 12:03:58 +02:00
Viktor Lofgren
5fb75adaae
Remove antique result scoring adjustment that makes no sense anymore.
2023-04-02 11:58:04 +02:00
Viktor Lofgren
cc4e089a5d
Consider average sentence length when selecting search results. This promotes proses over code listings, tabular data, etc.
2023-03-30 15:46:15 +02:00
Viktor Lofgren
dcf6218cdb
Fix bugs related to search result selection in the case with multiple search terms.
...
* A deduplication filter step ran too early, and removed many good results on the basis that they partially, but did not fully fit another set of search terms.
* Altered the query creation process to prefer documents where multiple terms appear in the priority index.
2023-03-29 15:18:52 +02:00
Viktor Lofgren
17ca4f9eea
Permit search results that are all synthetic to pass relevancy check.
2023-03-27 17:27:35 +02:00
Viktor Lofgren
7fb3db3249
Fix bug where link on front page news listing wouldn't work.
...
... also changed order of date and source to make the UI more consistent.
2023-03-27 17:26:46 +02:00
Viktor Lofgren
862e925d7c
"-Dsmall-ram=TRUE" no longer does anything. Remove references to the flag, which previously reduced the memory footprint of the loader and index service.
2023-03-26 21:37:11 +02:00
Viktor Lofgren
a0027ad32b
Fix broken diagram links after doc/ restructuring.
2023-03-25 16:32:10 +01:00
Viktor
ac1ac3ea57
Move database to a separate module
...
* Move database to a separate project, break apart sql file into separate entities.
* Fix front page news listing.
2023-03-25 15:26:17 +01:00
Viktor Lofgren
3464ca514b
Fix typeahead suggestions
2023-03-25 10:20:52 +01:00
Viktor
e3675d2fa9
Update readme.md
2023-03-22 17:02:03 +01:00
Viktor
c4a6bf7672
Update readme.md
2023-03-22 17:01:34 +01:00
Viktor
cb6865924e
Update readme.md
2023-03-22 16:59:38 +01:00
Viktor Lofgren
964014860a
Get suggestions working again
2023-03-22 15:11:22 +01:00
Viktor Lofgren
46f81aca2f
Break apart reverse index into a separate full index and priority index. It did this before using the same code. This will make the priority index about half as big since it no longer needs to keep metadata.
2023-03-21 16:12:31 +01:00
Viktor Lofgren
72115e490f
Put news into a database table instead of keeping them hardcoded, request counter on front page.
2023-03-19 12:54:58 +01:00
Viktor Lofgren
bdd2b4a43e
Put news into a database table instead of keeping them hardcoded.
2023-03-19 11:46:13 +01:00
Viktor Lofgren
6a20b2b678
Trivial reformatting of code.
2023-03-17 22:11:14 +01:00
Viktor Lofgren
3675c7a090
The search-service doesn't speak REST.
2023-03-17 16:21:52 +01:00
Viktor Lofgren
2eb972dea1
Remove unrelated code, break tools into their own directory.
2023-03-17 16:03:11 +01:00
Viktor Lofgren
449471a076
Yet more restructuring. Improved search result ranking.
2023-03-16 21:35:54 +01:00
Viktor Lofgren
0ecab53635
Yet more restructuring.
2023-03-13 23:40:26 +01:00
Viktor Lofgren
d82532b7f1
More restructuring, big bug fixes in keyword extraction.
2023-03-13 17:39:53 +01:00
Viktor Lofgren
73eaa0865d
The refactoring will continue until morale improves.
2023-03-12 10:50:31 +01:00
Viktor Lofgren
616effdb3c
The refactoring will continue until morale improves.
2023-03-12 10:04:48 +01:00