Commit Graph

874 Commits

Author SHA1 Message Date
Viktor Lofgren
58f2f86ea8 (crawler) Don't read all the data into RAM when doing a refresh-crawl 2023-07-21 19:47:52 +02:00
Viktor Lofgren
7bc1cff286 (minor) code cleanup 2023-07-21 14:28:37 +02:00
Viktor Lofgren
8f455f3b6d (control) Aborting a process spawner actor cancels the message to the actor. 2023-07-21 14:12:32 +02:00
Viktor Lofgren
f91d92cccb (crawler) WIP 2023-07-20 21:05:16 +02:00
Viktor Lofgren
08ca6399ec (converter) WIP 2023-07-19 17:14:45 +02:00
Viktor Lofgren
c0b5ea0e7d Revert "Less spammy default log settings"
This reverts commit f6e2216b87.
2023-07-18 19:28:42 +02:00
Viktor Lofgren
f21a3983aa Abortable processes 2023-07-18 18:40:12 +02:00
Viktor Lofgren
f6e2216b87 Less spammy default log settings 2023-07-17 21:42:13 +02:00
Viktor Lofgren
92ed513e4f Less spammy default log settings 2023-07-17 21:41:56 +02:00
Viktor Lofgren
d7ab21fe34 (*) Refactor Control Service and processes 2023-07-17 21:20:31 +02:00
Viktor Lofgren
bca4bbb6c8 (*) Refactor MQ and MQSM 2023-07-17 13:57:32 +02:00
Viktor Lofgren
e618aa34e9 (control) Name change process->fsm, new fsm:s
* FSM for spawning processes when messages appear for them
* FSM for removing data flagged for purging
2023-07-17 12:27:27 +02:00
Viktor Lofgren
6e41e78f36 (control) Higlight missing processes 2023-07-16 12:03:32 +02:00
Viktor Lofgren
c4dd9a0547 (control) Use MQFSMs to monitor and spawn processes when messages are sent to them 2023-07-16 11:58:47 +02:00
Viktor Lofgren
5ec10634d8 (mqfsm) Abortable state machine 2023-07-15 14:12:16 +02:00
Viktor Lofgren
cdae74d395 (control) Working redirects 2023-07-15 14:11:59 +02:00
Viktor Lofgren
8b74e3aa0d (*) File Storage WIP 2023-07-14 17:08:10 +02:00
Viktor Lofgren
23169ad818 (db) Model for file storage areas 2023-07-14 11:40:05 +02:00
Viktor Lofgren
d36e36c8fd (mq) Bugfix lastNMessages; use Lists.reverse properly 2023-07-14 11:39:15 +02:00
Viktor Lofgren
948d4d5f08 (control) Clean up the number of GUI views, abortable FSM tasks 2023-07-13 17:24:21 +02:00
Viktor Lofgren
0960e18f8e (control) Auto-refreshing tables 2023-07-13 15:44:36 +02:00
Viktor Lofgren
825fd10efa (control) Clean up the MQ ui a bit 2023-07-13 15:14:04 +02:00
Viktor Lofgren
1ec6f9cde2 (mq) More robust resume and recovery logic, protection against spurious state changes, minor bugfixes 2023-07-13 14:55:45 +02:00
Viktor Lofgren
a5118fe8f1 (minor) clean-up 2023-07-12 22:46:14 +02:00
Viktor Lofgren
6c88f00a9d (mqsm) guard against spurious transitions from unexpected messages 2023-07-12 22:44:05 +02:00
Viktor Lofgren
bf783dad7a (converter) NPE fix 2023-07-12 20:13:01 +02:00
Viktor Lofgren
8a53e107fa (mq) Synchronous and Asynchronous inboxes. 2023-07-12 20:12:52 +02:00
Viktor Lofgren
0ed938545b (mq) Add single-shot inbox 2023-07-12 18:41:27 +02:00
Viktor Lofgren
480abfe966 (minor) Add limit to pol count in MqPersistence, fix test 2023-07-12 18:16:23 +02:00
Viktor Lofgren
89e4343fdb (minor) Fix test 2023-07-12 18:15:50 +02:00
Viktor Lofgren
8c16a2aede (work-log, minor) Clean up code 2023-07-12 18:10:05 +02:00
Viktor Lofgren
5deec63667 (work-log) Better tests 2023-07-12 18:04:06 +02:00
Viktor Lofgren
363368b150 (converter) Remove auto-refresh. 2023-07-12 17:48:37 +02:00
Viktor Lofgren
74caf9e38a (processes) Remove forEach-constructs in favor of iterators. 2023-07-12 17:47:36 +02:00
Viktor Lofgren
7087ab5f07 (run) Reduce nginx access log noise for local setup 2023-07-11 23:11:34 +02:00
Viktor Lofgren
0b0cf48849 (control) Better looking UUIDs 2023-07-11 23:11:02 +02:00
Viktor Lofgren
00d9773b44 (control) Better looking progress bar 2023-07-11 21:37:32 +02:00
Viktor Lofgren
88b9ec70c6 (control, WIP) Run reconvert-load from converter :D 2023-07-11 18:05:37 +02:00
Viktor Lofgren
77261a38cd (control, WIP) MQFSM and ProcessService are sitting in a tree
We're spawning processes from the MSFSM in control service now!
2023-07-11 17:08:43 +02:00
Viktor Lofgren
3c7c77fe21 (minor) Bugfix in Path handling 2023-07-11 17:06:52 +02:00
Viktor Lofgren
4ee3f6ba3f (minor) Refactor ControlService 2023-07-11 14:51:51 +02:00
Viktor Lofgren
4c016b0318 Process monitoring
* Also refactored the SQL tables a bit
2023-07-11 14:46:21 +02:00
Viktor Lofgren
f59cab300e (minor) Javadoc comments for MqPersistance and MqMessageState 2023-07-10 21:59:51 +02:00
Viktor Lofgren
ec7826659a (minor) Javadoc comments for MqPersistance and MqMessageState 2023-07-10 21:52:25 +02:00
Viktor Lofgren
98b5f22104 (control) WIP control service
* Set messages to OK when received so they're cleaned up properly.
2023-07-10 21:33:57 +02:00
Viktor Lofgren
2283ceb77d (control) WIP control service 2023-07-10 18:58:43 +02:00
Viktor Lofgren
fba466d6e2 (crawler) Update URL blocklist
* Don't crawl MDN mirrors
* More mailing list variants
2023-07-10 18:58:43 +02:00
Viktor
cbbf60a599 Better fingerprinting (#35)
* Better fingerprinting for server tech
* Many more features in FeatureExtractor
* Blog specialization
* SiteType table
2023-07-10 18:58:43 +02:00
Viktor Lofgren
c125d8ab48 (search) Fix a bug where space-like characters weren't normalized in query processing. 2023-07-10 18:58:43 +02:00
Viktor Lofgren
f03146de4b (crawler) Fix bug poor handling of duplicate ids
* Also clean up the code a bit
2023-07-10 18:58:43 +02:00