Commit Graph

840 Commits

Author SHA1 Message Date
Viktor Lofgren
5e7c706802 Merge branch 'release' into master 2022-08-24 00:39:00 +02:00
vlofgren
ee0580273e Serve assets from search service instead of resource-store,
dynamically render index for future goodies,
css tweaks.
2022-08-24 00:35:22 +02:00
vlofgren
db4cf70784 Reduce resource consumption during crawling,
reduce TIME_WAIT sockets with a custom socket
factory.
2022-08-23 13:26:37 +02:00
vlofgren
6fc72b3eb8 Clean up feature extraction, fix misidentification of 'application/ld+json' as javascript. 2022-08-23 00:48:48 +02:00
vlofgren
6e2fdb7a77 Reduce crawling memory consumption,
Increase crawling threads,
Dynamically adjust crawling rate.
2022-08-23 00:35:45 +02:00
Viktor Lofgren
c97cdfcc25 Merge pull request 'Revert the previous change as my IP got kicked back to ol' reliable '81.170.128.52'' (#98) from master into release
Reviewed-on: https://git.marginalia.nu/marginalia/marginalia.nu/pulls/98
2022-08-22 17:34:22 +02:00
Viktor Lofgren
f48e92630e Merge branch 'release' into master 2022-08-22 17:34:13 +02:00
vlofgren
fc9d9d1bad And revert the previous change as my IP got kicked back to ol' reliable '81.170.128.52' 2022-08-22 17:32:56 +02:00
Viktor Lofgren
f498b55301 Merge pull request 'Update crawler IP file to reflect the fact that the IP changed.' (#97) from master into release
Reviewed-on: https://git.marginalia.nu/marginalia/marginalia.nu/pulls/97
2022-08-22 13:04:50 +02:00
Viktor Lofgren
e9dc01dfc5 Merge branch 'release' into master 2022-08-22 13:04:40 +02:00
vlofgren
087ad0124d Update crawler IP file to reflect the fact that the IP changed. 2022-08-22 13:04:07 +02:00
Viktor Lofgren
bbcedb7e9d Merge pull request 'Tweak CSS a tiny bit to add more padding to the right of info cells.' (#95) from master into release
Reviewed-on: https://git.marginalia.nu/marginalia/marginalia.nu/pulls/95
2022-08-19 16:07:52 +02:00
Viktor Lofgren
475b7fe5e0 Merge branch 'release' into master 2022-08-19 16:07:44 +02:00
vlofgren
095ed7c6c4 Tweak CSS a tiny bit to add more padding to the right of info cells. 2022-08-19 16:07:26 +02:00
Viktor Lofgren
7cf2be356c Merge pull request 'Add recipe selector to list of updates.' (#94) from master into release
Reviewed-on: https://git.marginalia.nu/marginalia/marginalia.nu/pulls/94
2022-08-19 15:55:49 +02:00
Viktor Lofgren
09f3e0c97e Merge branch 'release' into master 2022-08-19 15:55:38 +02:00
vlofgren
2adbe5f74c Update publicity roll. 2022-08-19 15:55:01 +02:00
Viktor Lofgren
a671341c5c Merge pull request 'Update publicity roll with Deutschlandfunk article' (#93) from master into release
Reviewed-on: https://git.marginalia.nu/marginalia/marginalia.nu/pulls/93
2022-08-19 15:51:11 +02:00
Viktor Lofgren
8de6b52e1e Merge branch 'release' into master 2022-08-19 15:51:01 +02:00
vlofgren
56987f6664 Update publicity roll. 2022-08-19 15:50:15 +02:00
vlofgren
7567890708 Update publicity roll. 2022-08-19 15:49:52 +02:00
Viktor Lofgren
f708fa643b Merge pull request 'master' (#92) from master into release
Reviewed-on: https://git.marginalia.nu/marginalia/marginalia.nu/pulls/92
2022-08-18 20:45:28 +02:00
Viktor Lofgren
2a72f1da78 Merge branch 'release' into master 2022-08-18 20:45:19 +02:00
vlofgren
ede62f2515 Retain cookies for domain. 2022-08-18 20:44:44 +02:00
vlofgren
a1eb8375a2 Exclude wp-content/uploads from crawling 2022-08-18 19:05:07 +02:00
Viktor Lofgren
014068ace5 Merge pull request 'Don't try to fetch text/css and text/javascript-files. Refactor fetcher to separate content type sniffing logic. Clean up crawler a smidge.' (#91) from master into release
Reviewed-on: https://git.marginalia.nu/marginalia/marginalia.nu/pulls/91
2022-08-18 18:41:22 +02:00
Viktor Lofgren
4e3a977049 Merge branch 'release' into master 2022-08-18 18:41:13 +02:00
vlofgren
340d80f6c7 Don't try to fetch text/css and text/javascript-files. Refactor fetcher to separate content type sniffing logic. Clean up crawler a smidge. 2022-08-18 18:40:34 +02:00
Viktor Lofgren
a915b2d37a Merge pull request 'Don't try to fetch ftp://, webcal://, etc.' (#90) from master into release
Reviewed-on: https://git.marginalia.nu/marginalia/marginalia.nu/pulls/90
2022-08-18 18:27:15 +02:00
vlofgren
6b6cd56e3a Don't try to fetch text/css and text/javascript-files. Refactor fetcher to separate content type sniffing logic. Clean up crawler a smidge. 2022-08-18 18:25:12 +02:00
Viktor Lofgren
e5d63d8a61 Merge branch 'release' into master 2022-08-18 17:26:18 +02:00
vlofgren
4afccdc536 Don't try to fetch ftp://, webcal://, etc. 2022-08-18 17:25:22 +02:00
Viktor Lofgren
4435334ebe Merge pull request 'Fix bug where url fragments were considered path elements' (#89) from master into release
Reviewed-on: https://git.marginalia.nu/marginalia/marginalia.nu/pulls/89
2022-08-18 16:48:48 +02:00
Viktor Lofgren
579037db05 Merge branch 'release' into master 2022-08-18 16:48:32 +02:00
vlofgren
5cd552458a Fix fragment bug. 2022-08-18 16:47:59 +02:00
vlofgren
2bc81e8e9a Fix fragment bug. 2022-08-18 16:45:51 +02:00
vlofgren
a034e3245e Fix fragment bug. 2022-08-18 16:43:34 +02:00
Viktor Lofgren
a8745d627b Merge pull request 'Fix bug in redirect handling that caused the crawler to not index some documents.' (#88) from master into release
Reviewed-on: https://git.marginalia.nu/marginalia/marginalia.nu/pulls/88
2022-08-17 00:52:34 +02:00
vlofgren
0bac422091 Fix bug in redirect handling that caused the crawler to not index some documents. 2022-08-17 00:51:10 +02:00
Viktor Lofgren
8f2485870d Merge branch 'release' into master 2022-08-17 00:49:55 +02:00
vlofgren
ce9abc00dc Fix bug in redirect handling that caused the crawler to not index some documents. 2022-08-17 00:49:32 +02:00
Viktor Lofgren
5f2258d459 Merge pull request 'Prepare for new crawl round' (#87) from master into release
Reviewed-on: https://git.marginalia.nu/marginalia/marginalia.nu/pulls/87
2022-08-16 22:53:20 +02:00
Viktor Lofgren
ef97414edb Merge branch 'release' into master 2022-08-16 22:49:26 +02:00
vlofgren
5cfef610b0 Preparations for new crawl round 2022-08-16 22:48:16 +02:00
vlofgren
123675d73b More caching 2022-08-15 15:39:10 +02:00
vlofgren
ceacfa5917 Tune down log spam 2022-08-15 15:37:26 +02:00
Viktor Lofgren
4fc0c59d29 Merge pull request 'Optimize search service by removing weird query spam' (#86) from master into release
Reviewed-on: https://git.marginalia.nu/marginalia/marginalia.nu/pulls/86
2022-08-15 15:28:05 +02:00
Viktor Lofgren
fdbb02bcaa Merge branch 'release' into master 2022-08-15 15:27:55 +02:00
vlofgren
f6b3e75cee Optimize search service by removing weird query spam 2022-08-15 15:27:22 +02:00
Viktor Lofgren
0c51cf5116 Merge pull request 'Crawling and processing improvements, index optimization' (#85) from master into release
Reviewed-on: https://git.marginalia.nu/marginalia/marginalia.nu/pulls/85
2022-08-15 13:59:49 +02:00