vlofgren
|
64844e1db2
|
While some might ask, why would the server host IP be available as a search keyword? I only ask you hold my beer as I make it a reality.
|
2022-07-19 03:01:23 +02:00 |
|
vlofgren
|
e83a7435c6
|
Raise min document length a tad, we've been getting a bit too much almost empty documents in the index.
|
2022-07-19 01:42:17 +02:00 |
|
vlofgren
|
9ae76a9264
|
Retire old and broken gemini support, needs to be re-implemented by having Memex talk to the API service rather than going directly to Search.
|
2022-07-18 18:36:39 +02:00 |
|
vlofgren
|
15bd54ef9f
|
Tidy up LoaderMain a bit
|
2022-07-18 17:22:22 +02:00 |
|
vlofgren
|
3d1031f8e4
|
Add lexicon dumping utility
|
2022-07-18 17:13:47 +02:00 |
|
vlofgren
|
9f7a28cbdb
|
Made search service more robust toward the case where Encyclopedia or Assistant is down
|
2022-07-17 22:21:41 +02:00 |
|
vlofgren
|
e22748e990
|
Better error logging for IO errors during conversion from configuration issues.
|
2022-07-17 22:08:06 +02:00 |
|
vlofgren
|
e30a20bb74
|
Fix bug in keyword loading when keywords have non-ASCII symbols, cleaner solution
|
2022-07-17 19:31:49 +02:00 |
|
vlofgren
|
f4966cf1f9
|
Fix bug in keyword loading when keywords have non-ASCII symbols
|
2022-07-17 15:18:16 +02:00 |
|
vlofgren
|
c5dbe269f7
|
Better logging for URL errors
|
2022-07-17 15:17:39 +02:00 |
|
vlofgren
|
89cca4dbff
|
Better logging for rare parsing exception
|
2022-07-16 21:27:04 +02:00 |
|
vlofgren
|
80b3ac3dd8
|
Tweaking the URL block list to exclude git noise better
|
2022-07-16 21:19:13 +02:00 |
|
vlofgren
|
c71cc3d43a
|
Fix overflow bugs in DictionaryHashMap that only surfaced without small RAM
|
2022-07-16 18:58:19 +02:00 |
|
vlofgren
|
661577b456
|
Add Fossil SCM commits to URL blocklist
|
2022-07-14 14:45:31 +02:00 |
|
vlofgren
|
20970a6161
|
Make processor more lenient toward quality, accept content-types which specify charset
|
2022-07-14 12:37:06 +02:00 |
|
vlofgren
|
e9a270c015
|
Merge branch 'master' into experimental
|
2022-07-14 10:28:01 +02:00 |
|
Viktor Lofgren
|
3197023834
|
Merge pull request 'Fix Memex Update Form Jank' (#33) from master into release
Reviewed-on: https://git.marginalia.nu/marginalia/marginalia.nu/pulls/33
|
2022-07-14 10:23:38 +02:00 |
|
Viktor Lofgren
|
ac9a3b6a2a
|
Merge branch 'release' into master
|
2022-07-14 10:23:17 +02:00 |
|
vlofgren
|
63d9c70667
|
Fix Memex Update Form Jank
|
2022-07-14 10:22:38 +02:00 |
|
vlofgren
|
fed2fa9397
|
Fix tiny NPE in converting
|
2022-07-11 23:25:03 +02:00 |
|
vlofgren
|
b0c40136ca
|
Cleaned up HTML features code a bit.
|
2022-07-08 19:52:12 +02:00 |
|
vlofgren
|
7dea94d36d
|
Cleaned up HTML features code a bit.
|
2022-07-08 17:25:16 +02:00 |
|
vlofgren
|
2b83e0d754
|
Block websites with "acceptable ads", as this seems a strong indicator the domain is either parked or spam.
|
2022-07-08 16:50:00 +02:00 |
|
vlofgren
|
7a4f5c27a6
|
Merge branch 'master' into experimental
# Conflicts:
# marginalia_nu/src/e2e/resources/init.sh
|
2022-07-08 16:37:37 +02:00 |
|
vlofgren
|
f3be865293
|
Allow query params for *some* path,param combinations, targeted at allowing the crawl of forums.
|
2022-07-08 16:36:09 +02:00 |
|
Viktor Lofgren
|
e219bd83f3
|
Merge pull request 'Memex refactored' (#32) from master into release
Reviewed-on: https://git.marginalia.nu/marginalia/marginalia.nu/pulls/32
|
2022-07-08 12:38:30 +02:00 |
|
Viktor Lofgren
|
978311327e
|
Merge branch 'release' into master
|
2022-07-08 12:36:18 +02:00 |
|
vlofgren
|
93c274f1d4
|
E2E-test for memex
|
2022-07-08 12:34:31 +02:00 |
|
vlofgren
|
853108028e
|
WIP: Selective URL param strings
|
2022-07-04 14:47:16 +02:00 |
|
vlofgren
|
ee07c4d94a
|
Refactored s/DictionaryWriter/KeywordLexicon/g to use significantly less memory and (potentially) support UTF-8.
|
2022-06-26 16:44:08 +02:00 |
|
vlofgren
|
e1b3477115
|
Experiments in keyword extraction
|
2022-06-23 17:02:28 +02:00 |
|
vlofgren
|
4516b23f90
|
Also grab alt text for images in a-tags in anchor text extractor
|
2022-06-22 13:12:44 +02:00 |
|
vlofgren
|
48e4aa3ee8
|
Clean up old junk from the WordPatterns class
|
2022-06-22 13:01:46 +02:00 |
|
vlofgren
|
35878c5102
|
Anchor text capture work-in-progress
|
2022-06-22 12:57:58 +02:00 |
|
vlofgren
|
1068694db6
|
Refactoring BTreeReader and binary search code
|
2022-06-20 12:35:58 +02:00 |
|
vlofgren
|
8139ab0d1d
|
Refactoring BTreeReader and binary search code
|
2022-06-20 12:28:15 +02:00 |
|
vlofgren
|
b1eff0107c
|
Refactoring BTreeReader and binary search code
|
2022-06-20 12:25:34 +02:00 |
|
vlofgren
|
c324c80efc
|
Refactoring BTreeReader and binary search code
|
2022-06-20 12:04:06 +02:00 |
|
vlofgren
|
420b9bb7e0
|
Refactoring BTreeReader and binary search code
|
2022-06-20 12:02:01 +02:00 |
|
vlofgren
|
f76af4ca79
|
Refactoring conversion
|
2022-06-18 15:54:58 +02:00 |
|
Viktor Lofgren
|
8df48d1c6d
|
Fix front page typo (#29)
Co-authored-by: vlofgren <vlofgren@gmail.com>
Reviewed-on: https://git.marginalia.nu/marginalia/marginalia.nu/pulls/29
|
2022-06-16 14:15:54 +02:00 |
|
Viktor Lofgren
|
b86ca895b0
|
Merge branch 'release' into master
|
2022-06-16 14:14:18 +02:00 |
|
vlofgren
|
63bdc28f79
|
Merge branch 'experimental' into experimental-new
|
2022-06-16 14:10:08 +02:00 |
|
vlofgren
|
2e55599850
|
Revert "Revert "Merge branch 'experimental' into master""
This reverts commit 81c77e7fcb .
|
2022-06-16 14:09:57 +02:00 |
|
vlofgren
|
082c9cc308
|
Fixing typo on front page.
(cherry picked from commit 5ef953ae3d )
|
2022-06-16 14:06:48 +02:00 |
|
vlofgren
|
5ef953ae3d
|
Fixing typo on front page.
|
2022-06-16 14:01:49 +02:00 |
|
Viktor Lofgren
|
a3a6b40cc3
|
Changes to crawler (#28)
Co-authored-by: vlofgren <vlofgren@gmail.com>
Reviewed-on: https://git.marginalia.nu/marginalia/marginalia.nu/pulls/28
|
2022-06-15 16:54:27 +02:00 |
|
vlofgren
|
8100bd4879
|
conflict
|
2022-06-15 16:53:19 +02:00 |
|
vlofgren
|
81c77e7fcb
|
Revert "Merge branch 'experimental' into master"
This reverts commit c3a432fdd4 , reversing
changes made to 1de63f225d .
|
2022-06-15 16:49:18 +02:00 |
|
Viktor Lofgren
|
c3a432fdd4
|
Merge branch 'experimental' into master
|
2022-06-15 16:44:23 +02:00 |
|