Viktor Lofgren
|
58f2f86ea8
|
(crawler) Don't read all the data into RAM when doing a refresh-crawl
|
2023-07-21 19:47:52 +02:00 |
|
Viktor Lofgren
|
f91d92cccb
|
(crawler) WIP
|
2023-07-20 21:05:16 +02:00 |
|
Viktor Lofgren
|
d7ab21fe34
|
(*) Refactor Control Service and processes
|
2023-07-17 21:20:31 +02:00 |
|
Viktor Lofgren
|
bca4bbb6c8
|
(*) Refactor MQ and MQSM
|
2023-07-17 13:57:32 +02:00 |
|
Viktor Lofgren
|
8b74e3aa0d
|
(*) File Storage WIP
|
2023-07-14 17:08:10 +02:00 |
|
Viktor Lofgren
|
74caf9e38a
|
(processes) Remove forEach-constructs in favor of iterators.
|
2023-07-12 17:47:36 +02:00 |
|
Viktor Lofgren
|
4c016b0318
|
Process monitoring
* Also refactored the SQL tables a bit
|
2023-07-11 14:46:21 +02:00 |
|
Viktor Lofgren
|
dbb758d1a8
|
Minor: Better error handling in crawled domain reader
|
2023-07-10 18:58:43 +02:00 |
|
Viktor Lofgren
|
da8bcc6e24
|
Minor: Don't blow up the reader on a corrupted file
|
2023-07-10 18:58:43 +02:00 |
|
Viktor Lofgren
|
baff83912e
|
Small optimizations that shave an hour of processing time :D
|
2023-06-28 15:41:10 +02:00 |
|
Viktor Lofgren
|
fbdedf53de
|
Fix bug in CrawlerRetreiver
... where the root URL wasn't always added properly to the front of the crawl queue.
|
2023-06-27 15:50:38 +02:00 |
|
Viktor Lofgren
|
7d741ff499
|
Fix so crawl plan replay doesn't crash if a file is missing.
|
2023-06-27 10:57:54 +02:00 |
|
Viktor Lofgren
|
16e37672fc
|
Bugfix crawl plan, doesn't use rewrite() everywhere
|
2023-03-30 15:41:07 +02:00 |
|
Viktor Lofgren
|
7c58ddce81
|
readme.md
|
2023-03-22 15:10:30 +01:00 |
|
Viktor Lofgren
|
2eb972dea1
|
Remove unrelated code, break tools into their own directory.
|
2023-03-17 16:03:11 +01:00 |
|
Viktor Lofgren
|
449471a076
|
Yet more restructuring. Improved search result ranking.
|
2023-03-16 21:35:54 +01:00 |
|
Viktor Lofgren
|
d82532b7f1
|
More restructuring, big bug fixes in keyword extraction.
|
2023-03-13 17:39:53 +01:00 |
|