vlofgren
|
ffde8c8305
|
Faster crawling
|
2022-08-10 18:46:13 +02:00 |
|
vlofgren
|
ce09fce639
|
Faster crawling
|
2022-08-10 17:03:58 +02:00 |
|
vlofgren
|
9c6e3b1772
|
Topical detection (experimental),
Adblock simulation (experimental)
|
2022-08-10 15:04:29 +02:00 |
|
vlofgren
|
d7167f956e
|
Adjust search result sort order to penalize scriptiness a bit
|
2022-08-08 18:59:57 +02:00 |
|
vlofgren
|
0f59675f7c
|
Clean up preconverter code
|
2022-08-08 18:08:18 +02:00 |
|
vlofgren
|
2af2c50f34
|
Clean up preconverter code
|
2022-08-08 15:29:47 +02:00 |
|
vlofgren
|
2bfde9d030
|
Recipe detection
|
2022-08-08 15:18:18 +02:00 |
|
vlofgren
|
0dfcf2f7af
|
Recipe detection
|
2022-08-08 15:18:07 +02:00 |
|
vlofgren
|
5c952d48f4
|
Speed up conversion
|
2022-08-08 15:18:07 +02:00 |
|
vlofgren
|
e39320d51d
|
Add support for additional random sets
|
2022-08-07 17:51:35 +02:00 |
|
vlofgren
|
b9bbda0e2e
|
Add support for additional random sets
|
2022-08-07 17:49:32 +02:00 |
|
vlofgren
|
743ba23f55
|
Add support for additional random sets
|
2022-08-07 17:46:30 +02:00 |
|
vlofgren
|
5fbafa63c1
|
Add better fallbacks to summary extractor
|
2022-08-06 15:17:00 +02:00 |
|
vlofgren
|
e22fde69ed
|
Screenshot bot
|
2022-08-04 21:14:17 +02:00 |
|
vlofgren
|
a6a6bdb013
|
Test rewarding linked terms.
|
2022-08-02 17:52:24 +02:00 |
|
vlofgren
|
6e68f930a6
|
Test rewarding linked terms.
|
2022-08-02 17:50:25 +02:00 |
|
vlofgren
|
0b61910b84
|
Test rewarding linked terms.
|
2022-08-02 17:43:21 +02:00 |
|
vlofgren
|
487d74592d
|
Test rewarding linked terms.
|
2022-08-02 17:38:18 +02:00 |
|
vlofgren
|
ae2419e2a5
|
Reduced max domain results for search command,
made it easier to configure.
|
2022-08-02 12:23:24 +02:00 |
|
vlofgren
|
c9eef92291
|
Updated opensearch def with hint to use api for automation.
|
2022-08-02 12:23:24 +02:00 |
|
vlofgren
|
3ccb1c6218
|
Simplified query builders, preparation for a-tag inclusion.
|
2022-08-01 20:29:15 +02:00 |
|
vlofgren
|
9a4183a481
|
A-tags loader
|
2022-08-01 20:05:55 +02:00 |
|
vlofgren
|
9a6c8339d0
|
Clean up DAO
|
2022-08-01 20:05:21 +02:00 |
|
vlofgren
|
7f985c0a57
|
Experimental domain-searching feature
|
2022-07-28 21:33:36 +02:00 |
|
vlofgren
|
e17d3015dc
|
Experimental domain-searching feature
|
2022-07-28 21:29:34 +02:00 |
|
vlofgren
|
8428198e61
|
Experimental domain-searching feature
|
2022-07-28 21:09:48 +02:00 |
|
vlofgren
|
c75c1db475
|
Experimental domain-searching feature
|
2022-07-28 20:50:40 +02:00 |
|
vlofgren
|
f027a72df9
|
Experimental domain-searching feature
|
2022-07-28 20:43:45 +02:00 |
|
vlofgren
|
449bb76c83
|
Experimental domain-searching feature
|
2022-07-28 20:26:07 +02:00 |
|
vlofgren
|
913599426f
|
Experimental domain-searching feature
|
2022-07-28 20:25:57 +02:00 |
|
vlofgren
|
145b02a736
|
Experimental domain-searching feature
|
2022-07-28 20:22:38 +02:00 |
|
vlofgren
|
ea5dbb301e
|
Experimental domain-searching feature
|
2022-07-28 20:06:51 +02:00 |
|
vlofgren
|
3916c05a02
|
Experimental domain-searching feature
|
2022-07-28 19:50:02 +02:00 |
|
vlofgren
|
6a2b199604
|
Experimental domain-searching feature
|
2022-07-28 19:45:44 +02:00 |
|
vlofgren
|
09aa217451
|
Experimental domain-searching feature
|
2022-07-28 19:45:03 +02:00 |
|
vlofgren
|
f1f4674e1c
|
Experimental domain-searching feature
|
2022-07-28 19:29:03 +02:00 |
|
vlofgren
|
ea312c7b61
|
Experimental domain-searching feature
|
2022-07-28 19:26:19 +02:00 |
|
vlofgren
|
806c81a3a3
|
Experimental domain-searching feature
|
2022-07-28 19:18:46 +02:00 |
|
vlofgren
|
27222fa192
|
Experimental domain-searching feature
|
2022-07-28 19:14:53 +02:00 |
|
vlofgren
|
29a2bc1d9a
|
Experimental domain-searching feature
|
2022-07-28 19:05:53 +02:00 |
|
vlofgren
|
14a6b60945
|
Experimental domain-searching feature
|
2022-07-28 19:02:27 +02:00 |
|
vlofgren
|
e3b2b36f03
|
Experimental domain-searching feature
|
2022-07-28 19:01:54 +02:00 |
|
vlofgren
|
e9db8b6c1d
|
Experimental domain-searching feature
|
2022-07-28 18:58:54 +02:00 |
|
vlofgren
|
e68cee5b58
|
Experimental domain-searching feature
|
2022-07-28 18:48:49 +02:00 |
|
vlofgren
|
b49ebda5dd
|
Experimental domain-searching feature
|
2022-07-28 18:46:06 +02:00 |
|
vlofgren
|
81c72b186b
|
Experimental domain-searching feature
|
2022-07-28 18:37:10 +02:00 |
|
vlofgren
|
ada11eb849
|
Experimental domain-searching feature
|
2022-07-28 18:34:01 +02:00 |
|
vlofgren
|
55b549903f
|
Experimental domain-searching feature
|
2022-07-28 18:34:01 +02:00 |
|
vlofgren
|
930719583f
|
Experimental domain-searching feature
|
2022-07-28 18:18:35 +02:00 |
|
vlofgren
|
e0e9f7481e
|
Experimental domain-searching feature
|
2022-07-28 18:13:31 +02:00 |
|
vlofgren
|
43c7a6790a
|
Experimental domain-searching feature
|
2022-07-28 18:06:08 +02:00 |
|
vlofgren
|
3b3cca211d
|
Experimental domain-searching feature
|
2022-07-28 18:03:18 +02:00 |
|
vlofgren
|
bf328a0597
|
Experimental domain-searching feature
|
2022-07-28 17:58:45 +02:00 |
|
vlofgren
|
23b7a5fc22
|
NPE fix for index buckets that aren't loaded, experimental new query mode for domains.
|
2022-07-28 17:16:23 +02:00 |
|
vlofgren
|
793e917fe4
|
Fix exclude term duplication from js flag.
|
2022-07-28 14:57:09 +02:00 |
|
vlofgren
|
fd1f3f796e
|
Fix exclude term duplication from js flag.
|
2022-07-28 14:51:55 +02:00 |
|
vlofgren
|
667a80a3a0
|
Deduplicate domains in explore mode
|
2022-07-27 13:56:08 +02:00 |
|
vlofgren
|
c5c73610df
|
Tweak screenshot service
|
2022-07-26 17:10:14 +02:00 |
|
vlofgren
|
e4457de606
|
Update peruse algorithm, make resource store disk configurable.
|
2022-07-26 16:34:18 +02:00 |
|
vlofgren
|
f4bd754e37
|
Fix buggy madvise code, clean up preconverter
|
2022-07-26 13:51:55 +02:00 |
|
vlofgren
|
191b426797
|
Fix madvise code
|
2022-07-25 15:20:50 +02:00 |
|
vlofgren
|
da40172c68
|
Fix madvise code
|
2022-07-25 15:05:48 +02:00 |
|
vlofgren
|
daec6d9fc0
|
Fix overflow error
|
2022-07-25 12:43:03 +02:00 |
|
vlofgren
|
48812d8a4f
|
Store screenshots in database instead of in the filesystem.
|
2022-07-20 12:02:26 +02:00 |
|
vlofgren
|
6d1e2442b6
|
Store wiki articles in database instead of in the filesystem.
|
2022-07-20 11:16:21 +02:00 |
|
vlofgren
|
51d273e39d
|
Store wiki articles in database instead of in the filesystem.
|
2022-07-20 11:06:06 +02:00 |
|
vlofgren
|
fb91ce84f5
|
Reduce log spam during conversion
|
2022-07-19 05:08:06 +02:00 |
|
vlofgren
|
ba375ef769
|
Tweaks to keyword extraction
|
2022-07-19 05:02:44 +02:00 |
|
vlofgren
|
825dea839d
|
Tweaks to keyword extraction
|
2022-07-19 04:50:19 +02:00 |
|
vlofgren
|
64844e1db2
|
While some might ask, why would the server host IP be available as a search keyword? I only ask you hold my beer as I make it a reality.
|
2022-07-19 03:01:23 +02:00 |
|
vlofgren
|
e83a7435c6
|
Raise min document length a tad, we've been getting a bit too much almost empty documents in the index.
|
2022-07-19 01:42:17 +02:00 |
|
vlofgren
|
9ae76a9264
|
Retire old and broken gemini support, needs to be re-implemented by having Memex talk to the API service rather than going directly to Search.
|
2022-07-18 18:36:39 +02:00 |
|
vlofgren
|
15bd54ef9f
|
Tidy up LoaderMain a bit
|
2022-07-18 17:22:22 +02:00 |
|
vlofgren
|
3d1031f8e4
|
Add lexicon dumping utility
|
2022-07-18 17:13:47 +02:00 |
|
vlofgren
|
9f7a28cbdb
|
Made search service more robust toward the case where Encyclopedia or Assistant is down
|
2022-07-17 22:21:41 +02:00 |
|
vlofgren
|
e22748e990
|
Better error logging for IO errors during conversion from configuration issues.
|
2022-07-17 22:08:06 +02:00 |
|
vlofgren
|
e30a20bb74
|
Fix bug in keyword loading when keywords have non-ASCII symbols, cleaner solution
|
2022-07-17 19:31:49 +02:00 |
|
vlofgren
|
f4966cf1f9
|
Fix bug in keyword loading when keywords have non-ASCII symbols
|
2022-07-17 15:18:16 +02:00 |
|
vlofgren
|
c5dbe269f7
|
Better logging for URL errors
|
2022-07-17 15:17:39 +02:00 |
|
vlofgren
|
89cca4dbff
|
Better logging for rare parsing exception
|
2022-07-16 21:27:04 +02:00 |
|
vlofgren
|
80b3ac3dd8
|
Tweaking the URL block list to exclude git noise better
|
2022-07-16 21:19:13 +02:00 |
|
vlofgren
|
c71cc3d43a
|
Fix overflow bugs in DictionaryHashMap that only surfaced without small RAM
|
2022-07-16 18:58:19 +02:00 |
|
vlofgren
|
661577b456
|
Add Fossil SCM commits to URL blocklist
|
2022-07-14 14:45:31 +02:00 |
|
vlofgren
|
20970a6161
|
Make processor more lenient toward quality, accept content-types which specify charset
|
2022-07-14 12:37:06 +02:00 |
|
vlofgren
|
e9a270c015
|
Merge branch 'master' into experimental
|
2022-07-14 10:28:01 +02:00 |
|
vlofgren
|
63d9c70667
|
Fix Memex Update Form Jank
|
2022-07-14 10:22:38 +02:00 |
|
vlofgren
|
fed2fa9397
|
Fix tiny NPE in converting
|
2022-07-11 23:25:03 +02:00 |
|
vlofgren
|
b0c40136ca
|
Cleaned up HTML features code a bit.
|
2022-07-08 19:52:12 +02:00 |
|
vlofgren
|
7dea94d36d
|
Cleaned up HTML features code a bit.
|
2022-07-08 17:25:16 +02:00 |
|
vlofgren
|
2b83e0d754
|
Block websites with "acceptable ads", as this seems a strong indicator the domain is either parked or spam.
|
2022-07-08 16:50:00 +02:00 |
|
vlofgren
|
7a4f5c27a6
|
Merge branch 'master' into experimental
# Conflicts:
# marginalia_nu/src/e2e/resources/init.sh
|
2022-07-08 16:37:37 +02:00 |
|
vlofgren
|
f3be865293
|
Allow query params for *some* path,param combinations, targeted at allowing the crawl of forums.
|
2022-07-08 16:36:09 +02:00 |
|
vlofgren
|
93c274f1d4
|
E2E-test for memex
|
2022-07-08 12:34:31 +02:00 |
|
vlofgren
|
853108028e
|
WIP: Selective URL param strings
|
2022-07-04 14:47:16 +02:00 |
|
vlofgren
|
ee07c4d94a
|
Refactored s/DictionaryWriter/KeywordLexicon/g to use significantly less memory and (potentially) support UTF-8.
|
2022-06-26 16:44:08 +02:00 |
|
vlofgren
|
e1b3477115
|
Experiments in keyword extraction
|
2022-06-23 17:02:28 +02:00 |
|
vlofgren
|
4516b23f90
|
Also grab alt text for images in a-tags in anchor text extractor
|
2022-06-22 13:12:44 +02:00 |
|
vlofgren
|
48e4aa3ee8
|
Clean up old junk from the WordPatterns class
|
2022-06-22 13:01:46 +02:00 |
|
vlofgren
|
35878c5102
|
Anchor text capture work-in-progress
|
2022-06-22 12:57:58 +02:00 |
|
vlofgren
|
1068694db6
|
Refactoring BTreeReader and binary search code
|
2022-06-20 12:35:58 +02:00 |
|