Viktor Lofgren
|
8c16a2aede
|
(work-log, minor) Clean up code
|
2023-07-12 18:10:05 +02:00 |
|
Viktor Lofgren
|
5deec63667
|
(work-log) Better tests
|
2023-07-12 18:04:06 +02:00 |
|
Viktor Lofgren
|
363368b150
|
(converter) Remove auto-refresh.
|
2023-07-12 17:48:37 +02:00 |
|
Viktor Lofgren
|
74caf9e38a
|
(processes) Remove forEach-constructs in favor of iterators.
|
2023-07-12 17:47:36 +02:00 |
|
Viktor Lofgren
|
7087ab5f07
|
(run) Reduce nginx access log noise for local setup
|
2023-07-11 23:11:34 +02:00 |
|
Viktor Lofgren
|
0b0cf48849
|
(control) Better looking UUIDs
|
2023-07-11 23:11:02 +02:00 |
|
Viktor Lofgren
|
00d9773b44
|
(control) Better looking progress bar
|
2023-07-11 21:37:32 +02:00 |
|
Viktor Lofgren
|
ac2d7034db
|
(minor) Bugfix in Path handling
|
2023-07-11 21:24:29 +02:00 |
|
Viktor Lofgren
|
88b9ec70c6
|
(control, WIP) Run reconvert-load from converter :D
|
2023-07-11 18:05:37 +02:00 |
|
Viktor Lofgren
|
77261a38cd
|
(control, WIP) MQFSM and ProcessService are sitting in a tree
We're spawning processes from the MSFSM in control service now!
|
2023-07-11 17:08:43 +02:00 |
|
Viktor Lofgren
|
3c7c77fe21
|
(minor) Bugfix in Path handling
|
2023-07-11 17:06:52 +02:00 |
|
Viktor Lofgren
|
4ee3f6ba3f
|
(minor) Refactor ControlService
|
2023-07-11 14:51:51 +02:00 |
|
Viktor Lofgren
|
4c016b0318
|
Process monitoring
* Also refactored the SQL tables a bit
|
2023-07-11 14:46:21 +02:00 |
|
Viktor Lofgren
|
f59cab300e
|
(minor) Javadoc comments for MqPersistance and MqMessageState
|
2023-07-10 21:59:51 +02:00 |
|
Viktor Lofgren
|
ec7826659a
|
(minor) Javadoc comments for MqPersistance and MqMessageState
|
2023-07-10 21:52:25 +02:00 |
|
Viktor Lofgren
|
98b5f22104
|
(control) WIP control service
* Set messages to OK when received so they're cleaned up properly.
|
2023-07-10 21:33:57 +02:00 |
|
Viktor Lofgren
|
2283ceb77d
|
(control) WIP control service
|
2023-07-10 18:58:43 +02:00 |
|
Viktor Lofgren
|
fba466d6e2
|
(crawler) Update URL blocklist
* Don't crawl MDN mirrors
* More mailing list variants
|
2023-07-10 18:58:43 +02:00 |
|
Viktor
|
cbbf60a599
|
Better fingerprinting (#35)
* Better fingerprinting for server tech
* Many more features in FeatureExtractor
* Blog specialization
* SiteType table
|
2023-07-10 18:58:43 +02:00 |
|
Viktor Lofgren
|
c125d8ab48
|
(search) Fix a bug where space-like characters weren't normalized in query processing.
|
2023-07-10 18:58:43 +02:00 |
|
Viktor Lofgren
|
f03146de4b
|
(crawler) Fix bug poor handling of duplicate ids
* Also clean up the code a bit
|
2023-07-10 18:58:43 +02:00 |
|
Viktor Lofgren
|
dbb758d1a8
|
Minor: Better error handling in crawled domain reader
|
2023-07-10 18:58:43 +02:00 |
|
Viktor Lofgren
|
da8bcc6e24
|
Minor: Don't blow up the reader on a corrupted file
|
2023-07-10 18:58:43 +02:00 |
|
Viktor Lofgren
|
96eecc6ea5
|
Minor: Readability.
|
2023-07-10 18:58:43 +02:00 |
|
Viktor Lofgren
|
74644d59f3
|
(crawler) Update URL blocklist
* Don't crawl MDN mirrors
* More mailing list variants
|
2023-07-10 18:04:43 +02:00 |
|
Viktor
|
0f9b90eb1c
|
Better fingerprinting (#35)
* Better fingerprinting for server tech
* Many more features in FeatureExtractor
* Blog specialization
* SiteType table
|
2023-07-10 17:36:12 +02:00 |
|
Viktor Lofgren
|
ae9537b68e
|
(search) Fix a bug where space-like characters weren't normalized in query processing.
|
2023-07-07 20:02:05 +02:00 |
|
Viktor Lofgren
|
2619d196bb
|
(crawler) Fix bug poor handling of duplicate ids
* Also clean up the code a bit
|
2023-07-07 19:56:14 +02:00 |
|
Viktor Lofgren
|
17db23c2c1
|
Minor: Better error handling in crawled domain reader
|
2023-07-07 19:48:32 +02:00 |
|
Viktor Lofgren
|
040bea1f75
|
Minor: Don't blow up the reader on a corrupted file
|
2023-07-07 19:48:11 +02:00 |
|
Viktor Lofgren
|
dc8277223a
|
Minor: Readability.
|
2023-07-06 19:50:13 +02:00 |
|
Viktor Lofgren
|
98d1898610
|
Bugfix: Don't run the xenforo specialization on phpBB.
|
2023-07-06 18:12:26 +02:00 |
|
Viktor Lofgren
|
1400fb4a9b
|
Bugfix: Don't run the xenforo specialization on phpBB.
|
2023-07-06 18:11:19 +02:00 |
|
Viktor Lofgren
|
647bbfa617
|
Fix so that crawler tests don't sometimes fetch real sitemaps when they're run.
|
2023-07-06 18:05:23 +02:00 |
|
Viktor Lofgren
|
b73fcc19fe
|
Fix so that crawler tests don't sometimes fetch real sitemaps when they're run.
|
2023-07-06 18:05:03 +02:00 |
|
Viktor Lofgren
|
d9e6c4f266
|
Trial integration of MQ-FSM into index service.
|
2023-07-06 18:04:16 +02:00 |
|
Viktor Lofgren
|
34653f03a2
|
Temporary bugfix, need to find source
|
2023-07-06 14:13:03 +02:00 |
|
Viktor Lofgren
|
f0a8ca440f
|
MQFSM Usability WIP
|
2023-07-06 13:33:11 +02:00 |
|
Viktor Lofgren
|
d89db10645
|
MQFSM Usability WIP
|
2023-07-06 13:02:16 +02:00 |
|
Viktor
|
413dc6ced4
|
Update FUNDING.yml
|
2023-07-05 18:03:36 +02:00 |
|
Adrthegamedev
|
78f21dd19a
|
(an attempt to) Add wikidot to wiki generators list
|
2023-07-05 18:03:36 +02:00 |
|
Viktor Lofgren
|
2cb209ae9c
|
Better wordpress fingerprinting
|
2023-07-05 18:03:36 +02:00 |
|
Viktor Lofgren
|
979a620ead
|
Bugfix where DocumentGeneratorExtractor out of bounded for generators starting with 'microsoft' or 'adobe' but having no followup string.
|
2023-07-05 18:03:36 +02:00 |
|
Viktor Lofgren
|
7a17933c65
|
Control service owns message queue garbage collection.
|
2023-07-04 19:52:30 +02:00 |
|
Viktor
|
019fa763cd
|
Update FUNDING.yml
|
2023-07-04 18:46:58 +02:00 |
|
Viktor Lofgren
|
097a163cf5
|
Getting a skeleton in place for the control service.
|
2023-07-04 18:25:42 +02:00 |
|
Viktor Lofgren
|
2ae0b8c159
|
Message queue based state machine
|
2023-07-04 17:42:06 +02:00 |
|
Viktor Lofgren
|
31ae71c7d6
|
Message queue WIP
|
2023-07-04 14:28:14 +02:00 |
|
Adrthegamedev
|
5ce894564c
|
(an attempt to) Add wikidot to wiki generators list
|
2023-07-03 13:31:42 +02:00 |
|
Viktor Lofgren
|
813fa08bdd
|
Better wordpress fingerprinting
|
2023-07-03 11:29:27 +02:00 |
|