Commit Graph

177 Commits

Author SHA1 Message Date
vlofgren
7567890708 Update publicity roll. 2022-08-19 15:49:52 +02:00
vlofgren
ede62f2515 Retain cookies for domain. 2022-08-18 20:44:44 +02:00
vlofgren
a1eb8375a2 Exclude wp-content/uploads from crawling 2022-08-18 19:05:07 +02:00
vlofgren
340d80f6c7 Don't try to fetch text/css and text/javascript-files. Refactor fetcher to separate content type sniffing logic. Clean up crawler a smidge. 2022-08-18 18:40:34 +02:00
vlofgren
6b6cd56e3a Don't try to fetch text/css and text/javascript-files. Refactor fetcher to separate content type sniffing logic. Clean up crawler a smidge. 2022-08-18 18:25:12 +02:00
vlofgren
4afccdc536 Don't try to fetch ftp://, webcal://, etc. 2022-08-18 17:25:22 +02:00
vlofgren
5cd552458a Fix fragment bug. 2022-08-18 16:47:59 +02:00
vlofgren
2bc81e8e9a Fix fragment bug. 2022-08-18 16:45:51 +02:00
vlofgren
a034e3245e Fix fragment bug. 2022-08-18 16:43:34 +02:00
vlofgren
0bac422091 Fix bug in redirect handling that caused the crawler to not index some documents. 2022-08-17 00:51:10 +02:00
vlofgren
ce9abc00dc Fix bug in redirect handling that caused the crawler to not index some documents. 2022-08-17 00:49:32 +02:00
vlofgren
5cfef610b0 Preparations for new crawl round 2022-08-16 22:48:16 +02:00
vlofgren
123675d73b More caching 2022-08-15 15:39:10 +02:00
vlofgren
ceacfa5917 Tune down log spam 2022-08-15 15:37:26 +02:00
vlofgren
f6b3e75cee Optimize search service by removing weird query spam 2022-08-15 15:27:22 +02:00
vlofgren
beafdfda9c Index optimizations that should reduce small object churn and IOPS a bit. 2022-08-15 13:58:18 +02:00
vlofgren
460dd098b0 Add advertisement Feature to search,
Add adblock simulation to processor,
Add filename and email address extraction to processor.
2022-08-12 17:12:16 +02:00
vlofgren
30d2a707ff Add advertisement Feature to search,
Add adblock simulation to processor,
Add filename and email address extraction to processor.
2022-08-12 13:50:18 +02:00
vlofgren
0e28ff5a72 Add features to suggestions 2022-08-10 21:32:19 +02:00
vlofgren
ba9e0d9829 Add features to suggestions 2022-08-10 19:50:14 +02:00
vlofgren
ffde8c8305 Faster crawling 2022-08-10 18:46:13 +02:00
vlofgren
ce09fce639 Faster crawling 2022-08-10 17:03:58 +02:00
vlofgren
9c6e3b1772 Topical detection (experimental),
Adblock simulation (experimental)
2022-08-10 15:04:29 +02:00
vlofgren
d7167f956e Adjust search result sort order to penalize scriptiness a bit 2022-08-08 18:59:57 +02:00
vlofgren
0f59675f7c Clean up preconverter code 2022-08-08 18:08:18 +02:00
vlofgren
2af2c50f34 Clean up preconverter code 2022-08-08 15:29:47 +02:00
vlofgren
2bfde9d030 Recipe detection 2022-08-08 15:18:18 +02:00
vlofgren
0dfcf2f7af Recipe detection 2022-08-08 15:18:07 +02:00
vlofgren
5c952d48f4 Speed up conversion 2022-08-08 15:18:07 +02:00
vlofgren
e39320d51d Add support for additional random sets 2022-08-07 17:51:35 +02:00
vlofgren
b9bbda0e2e Add support for additional random sets 2022-08-07 17:49:32 +02:00
vlofgren
743ba23f55 Add support for additional random sets 2022-08-07 17:46:30 +02:00
vlofgren
5fbafa63c1 Add better fallbacks to summary extractor 2022-08-06 15:17:00 +02:00
vlofgren
e22fde69ed Screenshot bot 2022-08-04 21:14:17 +02:00
vlofgren
a6a6bdb013 Test rewarding linked terms. 2022-08-02 17:52:24 +02:00
vlofgren
6e68f930a6 Test rewarding linked terms. 2022-08-02 17:50:25 +02:00
vlofgren
0b61910b84 Test rewarding linked terms. 2022-08-02 17:43:21 +02:00
vlofgren
487d74592d Test rewarding linked terms. 2022-08-02 17:38:18 +02:00
vlofgren
ae2419e2a5 Reduced max domain results for search command,
made it easier to configure.
2022-08-02 12:23:24 +02:00
vlofgren
c9eef92291 Updated opensearch def with hint to use api for automation. 2022-08-02 12:23:24 +02:00
vlofgren
3ccb1c6218 Simplified query builders, preparation for a-tag inclusion. 2022-08-01 20:29:15 +02:00
vlofgren
9a4183a481 A-tags loader 2022-08-01 20:05:55 +02:00
vlofgren
9a6c8339d0 Clean up DAO 2022-08-01 20:05:21 +02:00
vlofgren
7f985c0a57 Experimental domain-searching feature 2022-07-28 21:33:36 +02:00
vlofgren
e17d3015dc Experimental domain-searching feature 2022-07-28 21:29:34 +02:00
vlofgren
8428198e61 Experimental domain-searching feature 2022-07-28 21:09:48 +02:00
vlofgren
c75c1db475 Experimental domain-searching feature 2022-07-28 20:50:40 +02:00
vlofgren
f027a72df9 Experimental domain-searching feature 2022-07-28 20:43:45 +02:00
vlofgren
449bb76c83 Experimental domain-searching feature 2022-07-28 20:26:07 +02:00
vlofgren
913599426f Experimental domain-searching feature 2022-07-28 20:25:57 +02:00