Viktor Lofgren
f11103d31d
(WIP) Make it possible to sideload encyclopedia data.
...
This is mostly a pilot track for sideloading other large websites.
Also change coverter to produce a more compact output (java serialization instead of json).
2023-07-28 18:14:43 +02:00
Viktor Lofgren
9288d311d4
Add buffering to index journal writer
2023-07-28 18:11:19 +02:00
Viktor Lofgren
77d5e39fe0
Make processed data Serializable
2023-07-28 18:11:19 +02:00
Viktor Lofgren
27e781761d
(mq single shot inbox) Flag messages as OK if there is no recipient
2023-07-28 12:04:23 +02:00
Viktor Lofgren
92cac52813
(mq) Add indexes to MESSAGE_QUEUE
2023-07-28 12:03:51 +02:00
Viktor Lofgren
66bb12e55a
(converter) File listing and download for file storage
2023-07-26 21:59:35 +02:00
Viktor Lofgren
a5d980ee56
(converter) Hook crawl job extractor and adjacencies calculator into control service.
2023-07-26 15:46:22 +02:00
Viktor Lofgren
19c2ceec9b
(converter) Use Marginalia Yellow for control service
2023-07-26 11:50:23 +02:00
Viktor Lofgren
507f26ad47
(converter) Refactor converter to not keep instructions list in RAM.
...
(converter) Refactor converter to not keep instructions list in RAM.
(converter) Refactor converter to not keep instructions list in RAM.
2023-07-25 22:06:46 +02:00
Viktor Lofgren
fd44e09ebd
(loader) Don't delete the entire link database when the loader runs
2023-07-24 18:37:35 +02:00
Viktor Lofgren
09fd0a1d0e
(converter) Automatically clean stale file storage records if they disappear on disk
2023-07-24 17:04:42 +02:00
Viktor Lofgren
667b0ca0b0
(converter, WIP) Refactor CrawledDomainReader to not return iterators.
...
Instead return a closable class SerializableCrawlDataStream.
2023-07-24 16:28:30 +02:00
Viktor Lofgren
a56953c798
(converter, WIP) Refactor converter to not have to load everything into RAM.
2023-07-24 15:25:09 +02:00
Viktor Lofgren
7470c170b1
(minor) EdgeUrl.parse() should deal with null
2023-07-24 15:06:57 +02:00
Viktor Lofgren
bc330acfc9
(control) Better refresh script that doesn't cause weird artifacts
2023-07-23 19:26:16 +02:00
Viktor Lofgren
789e8eea85
(crawler) Clean up and refactor the code a bit
2023-07-23 19:08:38 +02:00
Viktor Lofgren
35b29e4f9e
(crawler) Clean up and refactor the code a bit
2023-07-23 19:06:37 +02:00
Viktor Lofgren
69f333c0bf
(crawler) Clean up and refactor the code a bit
2023-07-23 18:59:14 +02:00
Viktor Lofgren
c069c8c182
(crawler) Clean up crawl data reference and recrawl logic
2023-07-22 18:42:21 +02:00
Viktor Lofgren
9e4aa7da7c
(crawler) Support for X-Robots-Tag
2023-07-22 18:42:21 +02:00
Viktor Lofgren
e22e65eee4
(index) Fix bug related to debug print statements
2023-07-22 14:33:58 +02:00
Viktor Lofgren
d6b07e4d01
(controller) Improve the storage interface
2023-07-21 19:56:16 +02:00
Viktor Lofgren
995657c6ce
(big-string) Make big-string disable:able
2023-07-21 19:50:35 +02:00
Viktor Lofgren
58f2f86ea8
(crawler) Don't read all the data into RAM when doing a refresh-crawl
2023-07-21 19:47:52 +02:00
Viktor Lofgren
7bc1cff286
(minor) code cleanup
2023-07-21 14:28:37 +02:00
Viktor Lofgren
8f455f3b6d
(control) Aborting a process spawner actor cancels the message to the actor.
2023-07-21 14:12:32 +02:00
Viktor Lofgren
f91d92cccb
(crawler) WIP
2023-07-20 21:05:16 +02:00
Viktor Lofgren
08ca6399ec
(converter) WIP
2023-07-19 17:14:45 +02:00
Viktor Lofgren
c0b5ea0e7d
Revert "Less spammy default log settings"
...
This reverts commit f6e2216b87
.
2023-07-18 19:28:42 +02:00
Viktor Lofgren
f21a3983aa
Abortable processes
2023-07-18 18:40:12 +02:00
Viktor Lofgren
f6e2216b87
Less spammy default log settings
2023-07-17 21:42:13 +02:00
Viktor Lofgren
92ed513e4f
Less spammy default log settings
2023-07-17 21:41:56 +02:00
Viktor Lofgren
d7ab21fe34
(*) Refactor Control Service and processes
2023-07-17 21:20:31 +02:00
Viktor Lofgren
bca4bbb6c8
(*) Refactor MQ and MQSM
2023-07-17 13:57:32 +02:00
Viktor Lofgren
e618aa34e9
(control) Name change process->fsm, new fsm:s
...
* FSM for spawning processes when messages appear for them
* FSM for removing data flagged for purging
2023-07-17 12:27:27 +02:00
Viktor Lofgren
6e41e78f36
(control) Higlight missing processes
2023-07-16 12:03:32 +02:00
Viktor Lofgren
c4dd9a0547
(control) Use MQFSMs to monitor and spawn processes when messages are sent to them
2023-07-16 11:58:47 +02:00
Viktor Lofgren
5ec10634d8
(mqfsm) Abortable state machine
2023-07-15 14:12:16 +02:00
Viktor Lofgren
cdae74d395
(control) Working redirects
2023-07-15 14:11:59 +02:00
Viktor Lofgren
8b74e3aa0d
(*) File Storage WIP
2023-07-14 17:08:10 +02:00
Viktor Lofgren
23169ad818
(db) Model for file storage areas
2023-07-14 11:40:05 +02:00
Viktor Lofgren
d36e36c8fd
(mq) Bugfix lastNMessages; use Lists.reverse properly
2023-07-14 11:39:15 +02:00
Viktor Lofgren
948d4d5f08
(control) Clean up the number of GUI views, abortable FSM tasks
2023-07-13 17:24:21 +02:00
Viktor Lofgren
0960e18f8e
(control) Auto-refreshing tables
2023-07-13 15:44:36 +02:00
Viktor Lofgren
825fd10efa
(control) Clean up the MQ ui a bit
2023-07-13 15:14:04 +02:00
Viktor Lofgren
1ec6f9cde2
(mq) More robust resume and recovery logic, protection against spurious state changes, minor bugfixes
2023-07-13 14:55:45 +02:00
Viktor Lofgren
a5118fe8f1
(minor) clean-up
2023-07-12 22:46:14 +02:00
Viktor Lofgren
6c88f00a9d
(mqsm) guard against spurious transitions from unexpected messages
2023-07-12 22:44:05 +02:00
Viktor Lofgren
bf783dad7a
(converter) NPE fix
2023-07-12 20:13:01 +02:00
Viktor Lofgren
8a53e107fa
(mq) Synchronous and Asynchronous inboxes.
2023-07-12 20:12:52 +02:00