Viktor Lofgren
|
912129311d
|
(control) Message Queue GUI
|
2023-08-04 17:54:18 +02:00 |
|
Viktor Lofgren
|
624b78ec3a
|
(heartbeat) Task heartbeats
|
2023-08-04 14:40:06 +02:00 |
|
Viktor Lofgren
|
1d0cea1d55
|
(converter) GUI for dealing with user complaints
|
2023-08-03 17:59:57 +02:00 |
|
Viktor Lofgren
|
f01f608474
|
(blacklist) Support blacklists with subdomain
|
2023-08-03 17:58:52 +02:00 |
|
Viktor Lofgren
|
c22feaf42e
|
(crawl) Make crawler limiter request a GC when throttling
|
2023-08-03 17:58:18 +02:00 |
|
Viktor Lofgren
|
63e857f7cd
|
(control) Add basic api key management
|
2023-08-02 20:14:03 +02:00 |
|
Viktor Lofgren
|
9979c9defe
|
(search/index) Add blogosphere filter
|
2023-08-02 20:13:30 +02:00 |
|
Viktor Lofgren
|
7763df0715
|
(docs) Add control-service to the main readme.md
|
2023-08-01 22:52:41 +02:00 |
|
Viktor Lofgren
|
e088eb9ec8
|
(scripts|docs) Update scripts and documentations for the new operator's gui and file storage workflows.
|
2023-08-01 22:50:33 +02:00 |
|
Viktor Lofgren
|
19402772fc
|
(scripts|docs) Update scripts and documentations for the new operator's gui and file storage workflows.
|
2023-08-01 22:50:05 +02:00 |
|
Viktor Lofgren
|
ba724bc1b2
|
(scripts|docs) Update scripts and documentations for the new operator's gui and file storage workflows.
|
2023-08-01 22:47:37 +02:00 |
|
Viktor Lofgren
|
8de3e6ab80
|
(control) Fix bug where CrawlActor and RecrawlActor would steal each others' mail
|
2023-08-01 22:33:30 +02:00 |
|
Viktor Lofgren
|
659d2134ba
|
(file-storage) Deprecate mustClean flag
|
2023-08-01 22:32:30 +02:00 |
|
Viktor Lofgren
|
867410c66b
|
(file-storage) Automatic file storage discovery via manifest file
|
2023-08-01 18:05:43 +02:00 |
|
Viktor Lofgren
|
483c2dbb44
|
(conf) Change default user-agent to not associate it with the project; remove unused disks.properties file.
|
2023-08-01 17:34:25 +02:00 |
|
Viktor Lofgren
|
e5c9791b14
|
(crawler) Fix rare ConcurrentModificationError due to HashSet
|
2023-08-01 17:28:29 +02:00 |
|
Viktor Lofgren
|
58556af6c7
|
(db) Use flwyay for database migrations.
|
2023-08-01 17:08:42 +02:00 |
|
Viktor Lofgren
|
2e29038ecd
|
(db) Fix broken insert statement, move file storage defaults to a separate file.
|
2023-08-01 15:50:08 +02:00 |
|
Viktor Lofgren
|
36a23707c1
|
(control) Control service should be a core service.
|
2023-08-01 15:49:50 +02:00 |
|
Viktor Lofgren
|
c1ea60b399
|
(db) Default values for storage base
|
2023-08-01 15:05:04 +02:00 |
|
Viktor Lofgren
|
b08e302dd5
|
(lexicon) Optimize lexicon by using Murmur3_128's hash function
|
2023-08-01 15:02:13 +02:00 |
|
Viktor Lofgren
|
ea66195b97
|
(loader) Optimize loader by using zstd's direct streaming writer and the Murmur3_128 string hash
|
2023-08-01 15:02:13 +02:00 |
|
Viktor Lofgren
|
86a5cc5c5f
|
(hash) Modified version of common codec's Murmur3 hash
|
2023-08-01 14:57:40 +02:00 |
|
Viktor Lofgren
|
8f0cbf267b
|
(loader) Perform instruction reads in a separate thread for extra vroom vroom
|
2023-07-31 14:24:08 +02:00 |
|
Viktor Lofgren
|
2f8488610a
|
(loader) Fix bug where trailing deferred domain meta inserts weren't executed
|
2023-07-31 14:23:23 +02:00 |
|
Viktor Lofgren
|
d95f01b701
|
(control) Reduce log spam in control svc
|
2023-07-31 14:21:06 +02:00 |
|
Viktor Lofgren
|
c9d7635370
|
(control) Aborting an actor that waits on a process request terminates the running job.
(control) Aborting an actor that waits on a process request terminates the running job.
|
2023-07-31 14:21:06 +02:00 |
|
Viktor Lofgren
|
6b5fb0f841
|
(control) Disable the start button for actors that aren't directly initializable.
(control) Disable the start button for actors that aren't directly initializable.
|
2023-07-31 14:21:00 +02:00 |
|
Viktor Lofgren
|
12bd74d4f3
|
Clean up ProcessService
|
2023-07-31 10:56:16 +02:00 |
|
Viktor Lofgren
|
37c4cc68ed
|
TODO
|
2023-07-31 10:34:42 +02:00 |
|
Viktor Lofgren
|
1c948eb3d8
|
(minor) Alter DumbThreadPool in Converter to not claim the threads are crawlers.
|
2023-07-31 10:33:15 +02:00 |
|
Viktor Lofgren
|
cd90ca820f
|
YAGNI filter over ConverterDomainTypes
|
2023-07-31 10:32:47 +02:00 |
|
Viktor Lofgren
|
9786f82220
|
Fix environment variables to processes so jmc works
|
2023-07-31 10:32:23 +02:00 |
|
Viktor Lofgren
|
6f4e767a04
|
(minor) Re-enable monkey-patch-json for converter
|
2023-07-31 10:31:46 +02:00 |
|
Viktor Lofgren
|
5411950b87
|
(minor) Tidy up EdgeDomain class a bit, no functional difference
|
2023-07-31 10:31:29 +02:00 |
|
Viktor Lofgren
|
6ff7e9648f
|
(crawler) Use and pass the proper environment variables to the processes.
|
2023-07-30 16:54:02 +02:00 |
|
Viktor Lofgren
|
5c071ce4d3
|
(crawler) Clean up the code and remove unnecessary logging
|
2023-07-30 16:53:39 +02:00 |
|
Viktor Lofgren
|
caf3d231a8
|
(crawler) Fix rare issue with NPEs if the crawl queue is empty
|
2023-07-30 16:53:13 +02:00 |
|
Viktor Lofgren
|
730e8f74e4
|
(crawler) Even more memory optimizations.
* Fix minor resource leak in zstd streams
* Use pools for zstd streams
* Reduce the SSL session cache size
|
2023-07-30 14:19:55 +02:00 |
|
Viktor Lofgren
|
aba134284f
|
(crawler) Reduce log spam
|
2023-07-29 19:22:58 +02:00 |
|
Viktor Lofgren
|
2a6183f9e0
|
(crawler) Dynamic throttling of the number of active crawl jobs permitted to spawn; reduce queue size.
|
2023-07-29 19:20:09 +02:00 |
|
Viktor Lofgren
|
ee143bbc48
|
(crawler, converter) Fix so that DumbThreadPool actually waits for termination as intended.
|
2023-07-29 19:19:09 +02:00 |
|
Viktor Lofgren
|
d3f01bd171
|
(crawler, converter) Remove monkey patched gson from dependencies
|
2023-07-29 19:18:12 +02:00 |
|
Viktor Lofgren
|
05ba3bab96
|
(crawler) Make SitemapRetriever abort on too large sitemaps.
|
2023-07-29 19:18:12 +02:00 |
|
Viktor Lofgren
|
d2b6b2044c
|
(crawler) Reduce log spam in HttpFetcherImpl
|
2023-07-29 19:18:12 +02:00 |
|
Viktor Lofgren
|
7611b7900d
|
(crawler) Reduce long term memory allocation in DomainCrawlFrontier
(crawler) Reduce long term memory allocation in DomainCrawlFrontier
|
2023-07-29 19:18:12 +02:00 |
|
Viktor Lofgren
|
9ad32ee9c7
|
(control) Be more clear about when a process exits and why.
|
2023-07-29 19:16:00 +02:00 |
|
Viktor Lofgren
|
866db6c63f
|
(control) Dialog for updating message state; clean up file view.
|
2023-07-28 22:02:05 +02:00 |
|
Viktor Lofgren
|
01476577b8
|
(loader) Speed up loading back to original speeds with a cascading DELETE FROM EC_URL rather than EC_PAGE_DATA.
* Also clean up code and have proper rollbacks for transactions.
|
2023-07-28 22:00:07 +02:00 |
|
Viktor Lofgren
|
e237df4a10
|
(converter) Use a dumb thread pool instead of Java's executor service.
|
2023-07-28 18:15:16 +02:00 |
|