Viktor Lofgren
93f49f1fb3
(search-service) RSS feed for the news feed
2023-08-20 12:58:34 +02:00
Viktor Lofgren
b83bb5a48a
(docker) Upgrade to jdk20 image to fix weird mojibake problems.
...
Super weird encoding bug that only arises on versions below jdk18 causing crawl data to be read incorrectly.
Seems possibly related to the new standard charset of UTF-8. Maybe some library (unknown which) is attempting to be backwards compatible in a way that totally breaks?
2023-08-19 10:58:47 +02:00
Viktor Lofgren
704de50a9b
(forward-index, valuator) HTML features in valuator
...
Put it in the forward index for easy access during index-side valuation.
2023-08-18 11:54:56 +02:00
Viktor Lofgren
fcfe07fb7d
(valuator) Clean up code
2023-08-18 11:26:56 +02:00
Viktor Lofgren
ccf4990add
(minor) Clean up code
2023-08-18 11:26:39 +02:00
Viktor Lofgren
f2638dd845
(feature-extractor) More adtech nonsense
2023-08-18 11:26:19 +02:00
Viktor Lofgren
239980ecae
(minor) Improve comment
2023-08-18 11:26:05 +02:00
Viktor Lofgren
6cb784df75
(minor) Improve comment
2023-08-18 11:25:36 +02:00
Viktor Lofgren
efee904531
(search) Use the adtech bit instead of ads for ads flag
2023-08-18 11:24:59 +02:00
Viktor Lofgren
bee815b1c4
(converter) Add monsterinsights as an adtech tracker
2023-08-17 17:44:11 +02:00
Viktor Lofgren
e296b02649
(converter) Optimize LSH based within-domain deduplication
2023-08-17 17:43:46 +02:00
Viktor Lofgren
2656fcfe2c
(conf) Remove unnecessary JVM flags for processes
2023-08-17 17:42:47 +02:00
Viktor Lofgren
c019a029ec
(flags) Documentation and preventative bugfix
2023-08-17 17:42:31 +02:00
Viktor Lofgren
db0216936e
(summary) Reduce the chance of expensive operations
2023-08-16 15:48:34 +02:00
Viktor Lofgren
46d761f34f
(language) fasttext based language filter
2023-08-16 15:48:12 +02:00
Viktor Lofgren
4598c7f40f
(valuation) Penalize wordpress style kebab case urls
2023-08-16 13:11:24 +02:00
Viktor Lofgren
1d486bddee
(crawler) Reduce log spam
2023-08-16 11:12:09 +02:00
Viktor Lofgren
606db54dc8
(docs) Fix dead links to message-queue after moving it to libraries
2023-08-15 19:26:40 +02:00
Viktor Lofgren
d8073f0dde
(feature-extractor) Add mail.ru counter to non-adtech trackers
2023-08-15 19:10:43 +02:00
Viktor Lofgren
df85468c01
(control) Action for refreshing the blogs definition.
2023-08-15 11:38:52 +02:00
Viktor Lofgren
4404ad98ae
(mq) Fix missing @Inject that broke everything in control-service
2023-08-15 11:22:12 +02:00
Viktor Lofgren
e7192a9cad
(mq) Refactor mq and actor library and move it to libraries out of common
2023-08-15 10:53:23 +02:00
Viktor Lofgren
019b61b330
(control) Remove message queue listing from actors view.
2023-08-13 13:50:04 +02:00
Viktor Lofgren
f997707049
(control) Move event log out of plumbing
2023-08-13 13:40:50 +02:00
Viktor Lofgren
c56ee10185
(control) Separate [Process] and [Process and Load] actions for crawl data; all SLOW data is deletable.
2023-08-13 13:39:59 +02:00
Viktor Lofgren
8210e49b4e
(control) Helpful tooltips for the Actor table.
2023-08-13 12:55:56 +02:00
Viktor
e51bf8619d
Merge pull request #40 from MarginaliaSearch/vlofgren-patch-2
...
Update readme.md
2023-08-12 18:58:32 +02:00
Viktor
69b28fd07d
Update readme.md
2023-08-12 18:58:21 +02:00
Viktor
99884c2c7e
Update readme.md
2023-08-12 15:39:28 +02:00
Viktor Lofgren
a8f2e9ee2c
(control) Tidy up empty tables, remove actors from index view
2023-08-12 15:18:14 +02:00
Viktor Lofgren
a91b909103
(control) Event log on stop actor
2023-08-12 15:02:53 +02:00
Viktor Lofgren
d6b8b38955
(db) Add indices on SERVICE_EVENTLOG
2023-08-12 15:00:15 +02:00
Viktor Lofgren
99e031c529
(control) Remove broken pagination from events and message queue; new "light" events table for some views
2023-08-12 14:57:55 +02:00
Viktor Lofgren
998f239ed9
(control) Filterable event log view
2023-08-12 14:43:11 +02:00
Viktor Lofgren
0961f627b1
(control) Pretty up the nav bar
2023-08-12 14:42:42 +02:00
Viktor Lofgren
6483308bb0
(sql) Update default value for DOMAIN_SELECTION_TYPE
2023-08-11 14:01:15 +02:00
Viktor Lofgren
a42f707b2d
(docs) Update readme with up to date instructions
2023-08-11 13:43:00 +02:00
Viktor Lofgren
eef37927ba
(docs) Update readme with up to date instructions
2023-08-11 13:42:14 +02:00
Viktor Lofgren
7440da240d
(blacklist) Fix broken SQL migration
2023-08-11 13:33:35 +02:00
Viktor
d0239368e2
Merge pull request #39 from MarginaliaSearch/master-control-program
...
Message Queue, State Machine, and Control Service
2023-08-10 15:42:58 +02:00
Viktor Lofgren
4f8048be31
(blacklist) Blacklist management
2023-08-10 15:40:07 +02:00
Viktor Lofgren
807fb2d052
(service) Task heartbeat creates event log entries
2023-08-09 15:15:16 +02:00
Viktor Lofgren
ce293029c7
(converter) Treat adtech tracking as advertisement.
2023-08-09 14:23:53 +02:00
Viktor Lofgren
b5ed21be21
(mq) MqPersistence no longer relies on autoCommit being enabled
2023-08-09 14:23:22 +02:00
Viktor Lofgren
251fc63b42
(*) Fix merge gore
2023-08-09 13:33:28 +02:00
Viktor Lofgren
47f3855a4b
(control) More informative readme.md
2023-08-09 12:42:23 +02:00
Viktor Lofgren
71dfe9f33e
(control) Clean up the ControlService, move mq-related endpoints to MessageQueueService.
2023-08-09 12:42:01 +02:00
Viktor Lofgren
afad4f5ebb
(*) last touches
2023-08-07 12:59:33 +02:00
Viktor Lofgren
4ab1cd9502
(*) last touches
2023-08-07 12:57:44 +02:00
Viktor
52e2ab45bf
Merge branch 'master' into master-control-program
2023-08-07 12:53:43 +02:00