Commit graph

270 commits

Author SHA1 Message Date
Viktor Lofgren
92ed513e4f Less spammy default log settings 2023-07-17 21:41:56 +02:00
Viktor Lofgren
d7ab21fe34 (*) Refactor Control Service and processes 2023-07-17 21:20:31 +02:00
Viktor Lofgren
bca4bbb6c8 (*) Refactor MQ and MQSM 2023-07-17 13:57:32 +02:00
Viktor Lofgren
e618aa34e9 (control) Name change process->fsm, new fsm:s
* FSM for spawning processes when messages appear for them
* FSM for removing data flagged for purging
2023-07-17 12:27:27 +02:00
Viktor Lofgren
c4dd9a0547 (control) Use MQFSMs to monitor and spawn processes when messages are sent to them 2023-07-16 11:58:47 +02:00
Viktor Lofgren
5ec10634d8 (mqfsm) Abortable state machine 2023-07-15 14:12:16 +02:00
Viktor Lofgren
8b74e3aa0d (*) File Storage WIP 2023-07-14 17:08:10 +02:00
Viktor Lofgren
23169ad818 (db) Model for file storage areas 2023-07-14 11:40:05 +02:00
Viktor Lofgren
d36e36c8fd (mq) Bugfix lastNMessages; use Lists.reverse properly 2023-07-14 11:39:15 +02:00
Viktor Lofgren
948d4d5f08 (control) Clean up the number of GUI views, abortable FSM tasks 2023-07-13 17:24:21 +02:00
Viktor Lofgren
1ec6f9cde2 (mq) More robust resume and recovery logic, protection against spurious state changes, minor bugfixes 2023-07-13 14:55:45 +02:00
Viktor Lofgren
a5118fe8f1 (minor) clean-up 2023-07-12 22:46:14 +02:00
Viktor Lofgren
6c88f00a9d (mqsm) guard against spurious transitions from unexpected messages 2023-07-12 22:44:05 +02:00
Viktor Lofgren
8a53e107fa (mq) Synchronous and Asynchronous inboxes. 2023-07-12 20:12:52 +02:00
Viktor Lofgren
0ed938545b (mq) Add single-shot inbox 2023-07-12 18:41:27 +02:00
Viktor Lofgren
480abfe966 (minor) Add limit to pol count in MqPersistence, fix test 2023-07-12 18:16:23 +02:00
Viktor Lofgren
89e4343fdb (minor) Fix test 2023-07-12 18:15:50 +02:00
Viktor Lofgren
8c16a2aede (work-log, minor) Clean up code 2023-07-12 18:10:05 +02:00
Viktor Lofgren
5deec63667 (work-log) Better tests 2023-07-12 18:04:06 +02:00
Viktor Lofgren
74caf9e38a (processes) Remove forEach-constructs in favor of iterators. 2023-07-12 17:47:36 +02:00
Viktor Lofgren
77261a38cd (control, WIP) MQFSM and ProcessService are sitting in a tree
We're spawning processes from the MSFSM in control service now!
2023-07-11 17:08:43 +02:00
Viktor Lofgren
4c016b0318 Process monitoring
* Also refactored the SQL tables a bit
2023-07-11 14:46:21 +02:00
Viktor Lofgren
ec7826659a (minor) Javadoc comments for MqPersistance and MqMessageState 2023-07-10 21:52:25 +02:00
Viktor Lofgren
98b5f22104 (control) WIP control service
* Set messages to OK when received so they're cleaned up properly.
2023-07-10 21:33:57 +02:00
Viktor
cbbf60a599 Better fingerprinting (#35)
* Better fingerprinting for server tech
* Many more features in FeatureExtractor
* Blog specialization
* SiteType table
2023-07-10 18:58:43 +02:00
Viktor
0f9b90eb1c
Better fingerprinting (#35)
* Better fingerprinting for server tech
* Many more features in FeatureExtractor
* Blog specialization
* SiteType table
2023-07-10 17:36:12 +02:00
Viktor Lofgren
d9e6c4f266 Trial integration of MQ-FSM into index service. 2023-07-06 18:04:16 +02:00
Viktor Lofgren
f0a8ca440f MQFSM Usability WIP 2023-07-06 13:33:11 +02:00
Viktor Lofgren
d89db10645 MQFSM Usability WIP 2023-07-06 13:02:16 +02:00
Viktor Lofgren
2ae0b8c159 Message queue based state machine 2023-07-04 17:42:06 +02:00
Viktor Lofgren
31ae71c7d6 Message queue WIP 2023-07-04 14:28:14 +02:00
Viktor Lofgren
62cc9df206 Embryo of new control process
* New events and heartbeat tables in mariadb
* Refactored to a cleaner Service interface
2023-07-03 10:40:32 +02:00
Viktor Lofgren
8274e8a953 JVM flags for disabling black and block-lists. 2023-06-30 17:07:47 +02:00
Viktor Lofgren
a6a66c6d8a Improve site info for unknown domains:
* Placeholder screenshot should work
* Add a link to git-repo for submitting the site for crawling
2023-06-27 15:32:11 +02:00
Viktor Lofgren
f92d8a0975 EdgeUrl conversion to/from java.net.URL 2023-06-27 10:57:54 +02:00
Viktor Lofgren
5abaf13192 Fix serialization bug with CompressedBigString 2023-06-27 10:57:54 +02:00
Viktor Lofgren
bd2c3855ed Add bits and keywords for generator classes (docs, forum, wiki). 2023-06-23 21:35:28 +02:00
Viktor Lofgren
54c2be893b TRIVIAL: Remove unused import. 2023-06-22 17:21:47 +02:00
Viktor Lofgren
b5ef67ed28 Categorize generators by type
This is a great quality signal!
Add the type as document bitflags by category.
2023-06-22 16:04:37 +02:00
Viktor Lofgren
9455100907 Throw a custom exception when WMSA_HOME isn't found 2023-06-20 11:37:52 +02:00
Viktor Lofgren
d1a004bea6 (minor) Clean up StringPool 2023-06-19 17:58:19 +02:00
Viktor Lofgren
2cda57355a More word metadata tests 2023-05-28 11:57:06 +02:00
Viktor Lofgren
d42ab19166 Issue 5: Fix bug where some IPv6 addresses blew up domain loading. 2023-04-15 14:11:08 +02:00
Viktor Lofgren
2ab26f37b8 Bug fix for document metadata encoding that breaks year based queries. 2023-04-14 16:56:49 +02:00
Viktor
a278fc6296
Increase search result relevance (#8)
* Increase accuracy of the position bits.
* Increase their width to 56.
* Use a rolling position scheme for bits 16-56 to increase the average accuracy.
* Result ranking overhaul
* Optimized queries
* BM25 in the index service's ranking
* Make gui less jank
* Javadocs for ranking parameters.
2023-04-07 20:18:08 +02:00
Viktor Lofgren
716ab35b4e Search ranking debuggability improvements. 2023-04-02 13:43:24 +02:00
Viktor Lofgren
cc4e089a5d Consider average sentence length when selecting search results. This promotes proses over code listings, tabular data, etc. 2023-03-30 15:46:15 +02:00
Viktor Lofgren
03bd892b95 Improve document processing in conversion.
* Add flags for long and short documents.
* Break out common length logic from plugins.
* Cleaning up of related code.
2023-03-28 16:38:00 +02:00
Viktor Lofgren
c5f4cb34bf Documentation for DB 2023-03-25 16:14:16 +01:00
Viktor
be3ba3ef37
Update readme.md 2023-03-25 15:27:11 +01:00
Viktor
ac1ac3ea57
Move database to a separate module
* Move database to a separate project, break apart sql file into separate entities.
* Fix front page news listing.
2023-03-25 15:26:17 +01:00
Viktor
d2a9e1b644
Add processes link to readme.md for code/common 2023-03-25 12:42:44 +01:00
Viktor Lofgren
2f2c86a9f5 Fix bug where WmsaHome wouldn't look in /var/lib/wmsa as a fallback 2023-03-25 10:20:52 +01:00
Viktor Lofgren
964014860a Get suggestions working again 2023-03-22 15:11:22 +01:00
Viktor Lofgren
46f81aca2f Break apart reverse index into a separate full index and priority index. It did this before using the same code. This will make the priority index about half as big since it no longer needs to keep metadata. 2023-03-21 16:12:31 +01:00
Viktor Lofgren
ca22c287a5 Make use of DocumentFlags' flags 2023-03-21 16:03:15 +01:00
Viktor Lofgren
72115e490f Put news into a database table instead of keeping them hardcoded, request counter on front page. 2023-03-19 12:54:58 +01:00
Viktor Lofgren
bdd2b4a43e Put news into a database table instead of keeping them hardcoded. 2023-03-19 11:46:13 +01:00
Viktor Lofgren
2eb972dea1 Remove unrelated code, break tools into their own directory. 2023-03-17 16:03:11 +01:00
Viktor Lofgren
449471a076 Yet more restructuring. Improved search result ranking. 2023-03-16 21:35:54 +01:00
Viktor Lofgren
616effdb3c The refactoring will continue until morale improves. 2023-03-12 10:04:48 +01:00
Viktor Lofgren
4cec89da91 Fix bug where results would sometimes be presented solely based on the fact that the document is important on the site in general, regardless of whether it's important to the document. 2023-03-11 14:20:32 +01:00
Viktor Lofgren
2e2916cebe Additional code restructuring to get rid of util and misc-style packages. 2023-03-11 13:53:36 +01:00
Viktor Lofgren
6d939175b1 Additional code restructuring to get rid of util and misc-style packages. 2023-03-11 13:48:40 +01:00
Viktor Lofgren
722ff3bffb Word feature bit for words that appear in the URL, new search profile for plain text files, better plain text titles. 2023-03-10 16:46:56 +01:00
Viktor Lofgren
efb46cc703 Remove count from WordMetadata entirely. 2023-03-09 18:14:14 +01:00
Viktor Lofgren
8fb531c614 Word Metadata's count is hella broken, stopgap fix by bitCounting positions instead as this is messing with the search result ordering very badly. 2023-03-09 17:58:56 +01:00
Viktor Lofgren
9ece07d559 Chasing a result ranking bug 2023-03-09 17:52:35 +01:00
Viktor Lofgren
0ae4731cf1 Add invariant to WordMetadata 2023-03-09 17:27:07 +01:00
Viktor Lofgren
ad1be7c835 Move all code to a code directory. 2023-03-07 17:14:32 +01:00