Commit Graph

1128 Commits

Author SHA1 Message Date
Viktor Lofgren
e507844616 (language) Rollback language filter change a bit.
It appears to lead to too much junk in the lexicon.
2023-08-23 10:03:25 +02:00
Viktor Lofgren
ca12dd59f7 (loader) Fix Cleaner resource leak
Apparently Cleaners have an associated native thread, so the way to use them is to have a single static cleaner.
2023-08-22 18:05:00 +02:00
Viktor Lofgren
6f222b9800 (search) Add refresh link to explore mode.
This is a QOL improvement for mobile users, who otherwise would have to scroll all the way up to refresh.

Also removed the confusing "this is a random set of domains"-message when viewing adjacent websites, as it's not random.
2023-08-22 12:43:44 +02:00
Viktor Lofgren
fca62f261e (mq) Down-tune polling intervals in MQ
Polling 10 times a second across dozens of queues is a bit too aggressive and wasteful.
2023-08-22 11:49:30 +02:00
Viktor Lofgren
c7f0276005 (control) Don't spin on process output printing
This is the "correct" way of copying stdout and stderr to the curren't process' output.
2023-08-22 11:48:54 +02:00
Viktor Lofgren
46409c4c2d (loader) Use the correct interface for InstructionCounter 2023-08-22 11:11:36 +02:00
Viktor Lofgren
46df58d28b (control-service) Use default value for WMSA_HOME if it is not set 2023-08-22 11:11:01 +02:00
Viktor Lofgren
15912f31d0 (control-service) Basic GUI for deleting bad links from exploration mode 2023-08-21 18:35:26 +02:00
Viktor
dd380a5fb3
(doc) Add control-service to conceptual overview
Not adding every interaction as it would turn into a rat king.
2023-08-20 13:28:32 +02:00
Viktor Lofgren
93f49f1fb3 (search-service) RSS feed for the news feed 2023-08-20 12:58:34 +02:00
Viktor Lofgren
b83bb5a48a (docker) Upgrade to jdk20 image to fix weird mojibake problems.
Super weird encoding bug that only arises on versions below jdk18 causing crawl data to be read incorrectly.

Seems possibly related to the new standard charset of UTF-8. Maybe some library (unknown which) is attempting to be backwards compatible in a way that totally breaks?
2023-08-19 10:58:47 +02:00
Viktor Lofgren
704de50a9b (forward-index, valuator) HTML features in valuator
Put it in the forward index for easy access during index-side valuation.
2023-08-18 11:54:56 +02:00
Viktor Lofgren
fcfe07fb7d (valuator) Clean up code 2023-08-18 11:26:56 +02:00
Viktor Lofgren
ccf4990add (minor) Clean up code 2023-08-18 11:26:39 +02:00
Viktor Lofgren
f2638dd845 (feature-extractor) More adtech nonsense 2023-08-18 11:26:19 +02:00
Viktor Lofgren
239980ecae (minor) Improve comment 2023-08-18 11:26:05 +02:00
Viktor Lofgren
6cb784df75 (minor) Improve comment 2023-08-18 11:25:36 +02:00
Viktor Lofgren
efee904531 (search) Use the adtech bit instead of ads for ads flag 2023-08-18 11:24:59 +02:00
Viktor Lofgren
bee815b1c4 (converter) Add monsterinsights as an adtech tracker 2023-08-17 17:44:11 +02:00
Viktor Lofgren
e296b02649 (converter) Optimize LSH based within-domain deduplication 2023-08-17 17:43:46 +02:00
Viktor Lofgren
2656fcfe2c (conf) Remove unnecessary JVM flags for processes 2023-08-17 17:42:47 +02:00
Viktor Lofgren
c019a029ec (flags) Documentation and preventative bugfix 2023-08-17 17:42:31 +02:00
Viktor Lofgren
db0216936e (summary) Reduce the chance of expensive operations 2023-08-16 15:48:34 +02:00
Viktor Lofgren
46d761f34f (language) fasttext based language filter 2023-08-16 15:48:12 +02:00
Viktor Lofgren
4598c7f40f (valuation) Penalize wordpress style kebab case urls 2023-08-16 13:11:24 +02:00
Viktor Lofgren
1d486bddee (crawler) Reduce log spam 2023-08-16 11:12:09 +02:00
Viktor Lofgren
606db54dc8 (docs) Fix dead links to message-queue after moving it to libraries 2023-08-15 19:26:40 +02:00
Viktor Lofgren
d8073f0dde (feature-extractor) Add mail.ru counter to non-adtech trackers 2023-08-15 19:10:43 +02:00
Viktor Lofgren
df85468c01 (control) Action for refreshing the blogs definition. 2023-08-15 11:38:52 +02:00
Viktor Lofgren
4404ad98ae (mq) Fix missing @Inject that broke everything in control-service 2023-08-15 11:22:12 +02:00
Viktor Lofgren
e7192a9cad (mq) Refactor mq and actor library and move it to libraries out of common 2023-08-15 10:53:23 +02:00
Viktor Lofgren
019b61b330 (control) Remove message queue listing from actors view. 2023-08-13 13:50:04 +02:00
Viktor Lofgren
f997707049 (control) Move event log out of plumbing 2023-08-13 13:40:50 +02:00
Viktor Lofgren
c56ee10185 (control) Separate [Process] and [Process and Load] actions for crawl data; all SLOW data is deletable. 2023-08-13 13:39:59 +02:00
Viktor Lofgren
8210e49b4e (control) Helpful tooltips for the Actor table. 2023-08-13 12:55:56 +02:00
Viktor
e51bf8619d
Merge pull request #40 from MarginaliaSearch/vlofgren-patch-2
Update readme.md
2023-08-12 18:58:32 +02:00
Viktor
69b28fd07d
Update readme.md 2023-08-12 18:58:21 +02:00
Viktor
99884c2c7e
Update readme.md 2023-08-12 15:39:28 +02:00
Viktor Lofgren
a8f2e9ee2c (control) Tidy up empty tables, remove actors from index view 2023-08-12 15:18:14 +02:00
Viktor Lofgren
a91b909103 (control) Event log on stop actor 2023-08-12 15:02:53 +02:00
Viktor Lofgren
d6b8b38955 (db) Add indices on SERVICE_EVENTLOG 2023-08-12 15:00:15 +02:00
Viktor Lofgren
99e031c529 (control) Remove broken pagination from events and message queue; new "light" events table for some views 2023-08-12 14:57:55 +02:00
Viktor Lofgren
998f239ed9 (control) Filterable event log view 2023-08-12 14:43:11 +02:00
Viktor Lofgren
0961f627b1 (control) Pretty up the nav bar 2023-08-12 14:42:42 +02:00
Viktor Lofgren
6483308bb0 (sql) Update default value for DOMAIN_SELECTION_TYPE 2023-08-11 14:01:15 +02:00
Viktor Lofgren
a42f707b2d (docs) Update readme with up to date instructions 2023-08-11 13:43:00 +02:00
Viktor Lofgren
eef37927ba (docs) Update readme with up to date instructions 2023-08-11 13:42:14 +02:00
Viktor Lofgren
7440da240d (blacklist) Fix broken SQL migration 2023-08-11 13:33:35 +02:00
Viktor
d0239368e2
Merge pull request #39 from MarginaliaSearch/master-control-program
Message Queue, State Machine, and Control Service
2023-08-10 15:42:58 +02:00
Viktor Lofgren
4f8048be31 (blacklist) Blacklist management 2023-08-10 15:40:07 +02:00