vlofgren
|
6e2fdb7a77
|
Reduce crawling memory consumption,
Increase crawling threads,
Dynamically adjust crawling rate.
|
2022-08-23 00:35:45 +02:00 |
|
Viktor Lofgren
|
c97cdfcc25
|
Merge pull request 'Revert the previous change as my IP got kicked back to ol' reliable '81.170.128.52'' (#98) from master into release
Reviewed-on: https://git.marginalia.nu/marginalia/marginalia.nu/pulls/98
|
2022-08-22 17:34:22 +02:00 |
|
Viktor Lofgren
|
f48e92630e
|
Merge branch 'release' into master
|
2022-08-22 17:34:13 +02:00 |
|
vlofgren
|
fc9d9d1bad
|
And revert the previous change as my IP got kicked back to ol' reliable '81.170.128.52'
|
2022-08-22 17:32:56 +02:00 |
|
Viktor Lofgren
|
f498b55301
|
Merge pull request 'Update crawler IP file to reflect the fact that the IP changed.' (#97) from master into release
Reviewed-on: https://git.marginalia.nu/marginalia/marginalia.nu/pulls/97
|
2022-08-22 13:04:50 +02:00 |
|
Viktor Lofgren
|
e9dc01dfc5
|
Merge branch 'release' into master
|
2022-08-22 13:04:40 +02:00 |
|
vlofgren
|
087ad0124d
|
Update crawler IP file to reflect the fact that the IP changed.
|
2022-08-22 13:04:07 +02:00 |
|
Viktor Lofgren
|
bbcedb7e9d
|
Merge pull request 'Tweak CSS a tiny bit to add more padding to the right of info cells.' (#95) from master into release
Reviewed-on: https://git.marginalia.nu/marginalia/marginalia.nu/pulls/95
|
2022-08-19 16:07:52 +02:00 |
|
Viktor Lofgren
|
475b7fe5e0
|
Merge branch 'release' into master
|
2022-08-19 16:07:44 +02:00 |
|
vlofgren
|
095ed7c6c4
|
Tweak CSS a tiny bit to add more padding to the right of info cells.
|
2022-08-19 16:07:26 +02:00 |
|
Viktor Lofgren
|
7cf2be356c
|
Merge pull request 'Add recipe selector to list of updates.' (#94) from master into release
Reviewed-on: https://git.marginalia.nu/marginalia/marginalia.nu/pulls/94
|
2022-08-19 15:55:49 +02:00 |
|
Viktor Lofgren
|
09f3e0c97e
|
Merge branch 'release' into master
|
2022-08-19 15:55:38 +02:00 |
|
vlofgren
|
2adbe5f74c
|
Update publicity roll.
|
2022-08-19 15:55:01 +02:00 |
|
Viktor Lofgren
|
a671341c5c
|
Merge pull request 'Update publicity roll with Deutschlandfunk article' (#93) from master into release
Reviewed-on: https://git.marginalia.nu/marginalia/marginalia.nu/pulls/93
|
2022-08-19 15:51:11 +02:00 |
|
Viktor Lofgren
|
8de6b52e1e
|
Merge branch 'release' into master
|
2022-08-19 15:51:01 +02:00 |
|
vlofgren
|
56987f6664
|
Update publicity roll.
|
2022-08-19 15:50:15 +02:00 |
|
vlofgren
|
7567890708
|
Update publicity roll.
|
2022-08-19 15:49:52 +02:00 |
|
Viktor Lofgren
|
f708fa643b
|
Merge pull request 'master' (#92) from master into release
Reviewed-on: https://git.marginalia.nu/marginalia/marginalia.nu/pulls/92
|
2022-08-18 20:45:28 +02:00 |
|
Viktor Lofgren
|
2a72f1da78
|
Merge branch 'release' into master
|
2022-08-18 20:45:19 +02:00 |
|
vlofgren
|
ede62f2515
|
Retain cookies for domain.
|
2022-08-18 20:44:44 +02:00 |
|
vlofgren
|
a1eb8375a2
|
Exclude wp-content/uploads from crawling
|
2022-08-18 19:05:07 +02:00 |
|
Viktor Lofgren
|
014068ace5
|
Merge pull request 'Don't try to fetch text/css and text/javascript-files. Refactor fetcher to separate content type sniffing logic. Clean up crawler a smidge.' (#91) from master into release
Reviewed-on: https://git.marginalia.nu/marginalia/marginalia.nu/pulls/91
|
2022-08-18 18:41:22 +02:00 |
|
Viktor Lofgren
|
4e3a977049
|
Merge branch 'release' into master
|
2022-08-18 18:41:13 +02:00 |
|
vlofgren
|
340d80f6c7
|
Don't try to fetch text/css and text/javascript-files. Refactor fetcher to separate content type sniffing logic. Clean up crawler a smidge.
|
2022-08-18 18:40:34 +02:00 |
|
Viktor Lofgren
|
a915b2d37a
|
Merge pull request 'Don't try to fetch ftp://, webcal://, etc.' (#90) from master into release
Reviewed-on: https://git.marginalia.nu/marginalia/marginalia.nu/pulls/90
|
2022-08-18 18:27:15 +02:00 |
|
vlofgren
|
6b6cd56e3a
|
Don't try to fetch text/css and text/javascript-files. Refactor fetcher to separate content type sniffing logic. Clean up crawler a smidge.
|
2022-08-18 18:25:12 +02:00 |
|
Viktor Lofgren
|
e5d63d8a61
|
Merge branch 'release' into master
|
2022-08-18 17:26:18 +02:00 |
|
vlofgren
|
4afccdc536
|
Don't try to fetch ftp://, webcal://, etc.
|
2022-08-18 17:25:22 +02:00 |
|
Viktor Lofgren
|
4435334ebe
|
Merge pull request 'Fix bug where url fragments were considered path elements' (#89) from master into release
Reviewed-on: https://git.marginalia.nu/marginalia/marginalia.nu/pulls/89
|
2022-08-18 16:48:48 +02:00 |
|
Viktor Lofgren
|
579037db05
|
Merge branch 'release' into master
|
2022-08-18 16:48:32 +02:00 |
|
vlofgren
|
5cd552458a
|
Fix fragment bug.
|
2022-08-18 16:47:59 +02:00 |
|
vlofgren
|
2bc81e8e9a
|
Fix fragment bug.
|
2022-08-18 16:45:51 +02:00 |
|
vlofgren
|
a034e3245e
|
Fix fragment bug.
|
2022-08-18 16:43:34 +02:00 |
|
Viktor Lofgren
|
a8745d627b
|
Merge pull request 'Fix bug in redirect handling that caused the crawler to not index some documents.' (#88) from master into release
Reviewed-on: https://git.marginalia.nu/marginalia/marginalia.nu/pulls/88
|
2022-08-17 00:52:34 +02:00 |
|
vlofgren
|
0bac422091
|
Fix bug in redirect handling that caused the crawler to not index some documents.
|
2022-08-17 00:51:10 +02:00 |
|
Viktor Lofgren
|
8f2485870d
|
Merge branch 'release' into master
|
2022-08-17 00:49:55 +02:00 |
|
vlofgren
|
ce9abc00dc
|
Fix bug in redirect handling that caused the crawler to not index some documents.
|
2022-08-17 00:49:32 +02:00 |
|
Viktor Lofgren
|
5f2258d459
|
Merge pull request 'Prepare for new crawl round' (#87) from master into release
Reviewed-on: https://git.marginalia.nu/marginalia/marginalia.nu/pulls/87
|
2022-08-16 22:53:20 +02:00 |
|
Viktor Lofgren
|
ef97414edb
|
Merge branch 'release' into master
|
2022-08-16 22:49:26 +02:00 |
|
vlofgren
|
5cfef610b0
|
Preparations for new crawl round
|
2022-08-16 22:48:16 +02:00 |
|
vlofgren
|
123675d73b
|
More caching
|
2022-08-15 15:39:10 +02:00 |
|
vlofgren
|
ceacfa5917
|
Tune down log spam
|
2022-08-15 15:37:26 +02:00 |
|
Viktor Lofgren
|
4fc0c59d29
|
Merge pull request 'Optimize search service by removing weird query spam' (#86) from master into release
Reviewed-on: https://git.marginalia.nu/marginalia/marginalia.nu/pulls/86
|
2022-08-15 15:28:05 +02:00 |
|
Viktor Lofgren
|
fdbb02bcaa
|
Merge branch 'release' into master
|
2022-08-15 15:27:55 +02:00 |
|
vlofgren
|
f6b3e75cee
|
Optimize search service by removing weird query spam
|
2022-08-15 15:27:22 +02:00 |
|
Viktor Lofgren
|
0c51cf5116
|
Merge pull request 'Crawling and processing improvements, index optimization' (#85) from master into release
Reviewed-on: https://git.marginalia.nu/marginalia/marginalia.nu/pulls/85
|
2022-08-15 13:59:49 +02:00 |
|
Viktor Lofgren
|
c800af3a59
|
Merge branch 'release' into master
|
2022-08-15 13:59:38 +02:00 |
|
vlofgren
|
beafdfda9c
|
Index optimizations that should reduce small object churn and IOPS a bit.
|
2022-08-15 13:58:18 +02:00 |
|
vlofgren
|
460dd098b0
|
Add advertisement Feature to search,
Add adblock simulation to processor,
Add filename and email address extraction to processor.
|
2022-08-12 17:12:16 +02:00 |
|
Viktor Lofgren
|
02abe498ff
|
master (#84)
Co-authored-by: vlofgren <vlofgren@gmail.com>
Reviewed-on: https://git.marginalia.nu/marginalia/marginalia.nu/pulls/84
|
2022-08-12 13:50:57 +02:00 |
|