Commit Graph

10 Commits

Author SHA1 Message Date
Viktor Lofgren
17db23c2c1 Minor: Better error handling in crawled domain reader 2023-07-07 19:48:32 +02:00
Viktor Lofgren
040bea1f75 Minor: Don't blow up the reader on a corrupted file 2023-07-07 19:48:11 +02:00
Viktor Lofgren
baff83912e Small optimizations that shave an hour of processing time :D 2023-06-28 15:41:10 +02:00
Viktor Lofgren
fbdedf53de Fix bug in CrawlerRetreiver
... where the root URL wasn't always added properly to the front of the crawl queue.
2023-06-27 15:50:38 +02:00
Viktor Lofgren
7d741ff499 Fix so crawl plan replay doesn't crash if a file is missing. 2023-06-27 10:57:54 +02:00
Viktor Lofgren
16e37672fc Bugfix crawl plan, doesn't use rewrite() everywhere 2023-03-30 15:41:07 +02:00
Viktor Lofgren
7c58ddce81 readme.md 2023-03-22 15:10:30 +01:00
Viktor Lofgren
2eb972dea1 Remove unrelated code, break tools into their own directory. 2023-03-17 16:03:11 +01:00
Viktor Lofgren
449471a076 Yet more restructuring. Improved search result ranking. 2023-03-16 21:35:54 +01:00
Viktor Lofgren
d82532b7f1 More restructuring, big bug fixes in keyword extraction. 2023-03-13 17:39:53 +01:00