Commit Graph

1621 Commits

Author SHA1 Message Date
Viktor Lofgren
ec8fe9f031 (doc) Add screenshot to conversion step in crawling doc 2024-01-15 16:31:33 +01:00
Viktor Lofgren
a1df9e886a (control) Also clean up stale 'NEW' messages 2024-01-15 16:14:02 +01:00
Viktor Lofgren
ce5ae1931d (doc) Update Crawling Docs
Still a WIP, but now more accurately reflects the new GUI, with screenshots to boot!
2024-01-15 16:08:01 +01:00
Viktor Lofgren
b9445d4f62 (doc) Update Crawling Docs
Still a WIP, but now more accurately reflects the new GUI, with screenshots to boot!
2024-01-15 16:06:59 +01:00
Viktor Lofgren
fd1eec99b5 (cleanup) Fix broken tests 2024-01-15 15:44:33 +01:00
Viktor Lofgren
e162406d40 (control) New control-side actors for cleaning up stale service heartbeats and message queue entries 2024-01-15 15:44:23 +01:00
Viktor Lofgren
c41e68aaab (control) New export actions for RSS/Atom feeds and term frequency data
This commit also refactors the executor a bit, and introduces a new converter-feature called data-extractors for this class of jobs.
2024-01-15 14:54:26 +01:00
Viktor Lofgren
4665af6c42 (control) Move export data endpoint to actions controller 2024-01-15 11:06:22 +01:00
Viktor Lofgren
c0b15427fe (control) New crawl view should use radio buttons as multiple specs aren't supported 2024-01-15 11:03:47 +01:00
Viktor Lofgren
f29a9d972d (control) Move 'new crawl spec' to /node/:id/actions, out of /node/:id/storage 2024-01-15 11:02:00 +01:00
Viktor Lofgren
b192373ae7 (control) Highlight unavailable items (creating, deleting) in node actions views 2024-01-15 10:47:54 +01:00
Viktor Lofgren
c042650382 (docs) Improve query service documentation 2024-01-13 21:16:45 +01:00
Viktor Lofgren
07a916a720 (search) Give the swipe hint on mobile a nicer finish 2024-01-13 18:51:54 +01:00
Viktor Lofgren
5134044530 (assistant) Make assistant client more robust to the service going down
This is especially important for the non-essential functions, like website similarities...
2024-01-13 18:29:30 +01:00
Viktor Lofgren
4c62065e74 (install) Add two separate templates for the install script
One template is for the full Marginalia Search style install, and the other is for a barebones install with no Marginalia-related fluff.
2024-01-13 18:27:42 +01:00
Viktor Lofgren
d28fc99119 (MainClass) ensure logging isn't loaded before service name is known
This causes logs all to have names like ${sys:service-name}, instead of the service name...
2024-01-13 18:19:50 +01:00
Viktor Lofgren
c9fb45c85f (search) Fix control.hideMarginaliaApp handling 2024-01-13 17:24:15 +01:00
Viktor Lofgren
7c6e18f7a7 (*) Overhaul settings and properties
Use a system.properties file to configure the system.  This is loaded statically by MainClass or ProcessMainClass.  Update the property names to be more consistent, and update the documentations to reflect the changes.
2024-01-13 17:12:18 +01:00
Viktor Lofgren
176b9c9666 (convert) Add sizeHints to legacy serializable cawl data stream
This reduces the maximum memory usage when processing legacy crawl data
2024-01-13 15:50:36 +01:00
Viktor Lofgren
ecd9c35233 (control) Clean up the event log
* Generate fewer uninteresting event messages.
* Display fewer irrelevant fields in the overview table.
2024-01-13 13:28:02 +01:00
Viktor Lofgren
71e32c57d9 (control) Add better timestamps for the events and message queue views
Adjust display precision based on distance into the past, full ms-accurate timestamps available via hover-action.
2024-01-13 13:04:56 +01:00
Viktor Lofgren
2fefd0e4e3 (control) Add better timestamps for the events and message queue views
Adjust display precision based on distance into the past, full ms-accurate timestamps available via hover-action.
2024-01-13 13:03:52 +01:00
Viktor Lofgren
81eaf79a25 (control) UX polish 2024-01-13 12:31:13 +01:00
Viktor Lofgren
8dea7217a6 (control) UX fixes, node GUI doesn't break when an executor service goes offline. 2024-01-13 12:17:30 +01:00
Viktor Lofgren
c0fb9e17e8 (control) Add filter dropdown to message queue table
This makes inspecting the queues for processes much easier, as it's otherwise
often these important messages are drowned out by FSM chatter.
2024-01-12 18:46:17 +01:00
Viktor Lofgren
83776a8dce (control) Wean the ExportDataActor off EC_DOMAIN_LINK
The EC_DOMAIN_LINK table is deprecated and slated for removal, use QueryClient.getAllDomainLinks() instead.

The ExportDataActor now uses the QueryClient appropriately.  The CSV format was also changed to quote the values, to prevent e.g. Excel from interpreting the comma as a decimal separator when previewing the file.

Finally the form for triggering an export was overhauled.
2024-01-12 17:09:11 +01:00
Viktor Lofgren
98c0972619 (control) Add a summary table for Actors in the Node overview 2024-01-12 16:32:15 +01:00
Viktor Lofgren
56d832d661 (control) Adjust the margins of the headings to be consistent 2024-01-12 16:16:57 +01:00
Viktor Lofgren
de3a350afe (control) Disable broken actions and mark the actions view as WIP 2024-01-12 16:16:39 +01:00
Viktor Lofgren
708a741960 (test) Clean up test usage of migrations
Several tests were manually running migrations in a large copy-paste blob of code.  This makes the test less useful as it's possible to break the code while keeping the tests green by introducing a new migration that never gets run in the tests, and it's also difficult to reason about what the tests are doing.

A new test helper library is introduced with a TestMigrationLoader that can both run Flyway migrations, or load specific migrations in the cases a specific set of migrations need to be loaded.   Existing tests are migrated to use the new code.
2024-01-12 15:55:50 +01:00
Viktor Lofgren
0caef1b307 (warc) Toggle for saving WARC data
Add a toggle for saving the WARC data generated by the search engine's crawler.  Normally this is discarded, but for debugging or archival purposes, retaining it may be of interest.

The warc files are concatenated into larger archives, up to about 1 GB each.
An index is also created containing filenames, domain names, offsets and sizes
to help navigate these larger archives.

The warc data is saved in a directory warc/ under the crawl data storage.
2024-01-12 13:45:14 +01:00
Viktor Lofgren
264e2db539 (control) UX-improvements for control service
This commit overhauls a lot of the UX for the control service, adding a new actions menu to the nodes views.  It has many small tweaks to make the work flow better.

It also adds a new /uploads directory in each index node, from which sideloaded data can be selected.  This is a bit of a breaking change, as this directory needs to exist in each index node.
2024-01-12 12:33:05 +01:00
Viktor Lofgren
734996002c (*) install script for deploying Marginalia outside the codebase
The changeset also makes the control service responsible for flyway migrations.  This helps reduce the number of places the database configuration needs to be spread out.  These automatic migrations can be disabled with -DdisableFlyway=true.

The commit also adds curl to the docker container, to enable docker health checks and interdependencies.
2024-01-11 12:40:03 +01:00
Viktor Lofgren
205e5016e8 (docs) Document barebones config 2024-01-11 09:43:08 +01:00
Viktor Lofgren
a0f28a7f9b (*) Add a barebones configuration
This adds a docker-compose file 'docker-compose-barebones.yml' which will only start the minimal number of services needed to run a whitelabel Marginalia Search-style search engine, with none of the surrounding frills.

The change also adds a minimal search GUI to the query service, which is also available with JSON results if the appropriate Accept header is provided.
2024-01-10 20:23:51 +01:00
Viktor Lofgren
14b7680328 (loader) Update the size of the keyword files created by the loader
Previously these ended up being about 200 Mb each, which is wastefully small.  Increasing the size of these files makes the index construction faster.
2024-01-10 17:09:19 +01:00
Viktor Lofgren
f44222ce53 (control) Add a 'cancel' button to the process list
This is a very nice QoL improvement, since it means you don't have to dig in the Actors view to terminate processes.
2024-01-10 15:02:42 +01:00
Viktor Lofgren
f310ad8d98 (control) Actor terminations work better
Improves jank in the abort actor action, which would sometimes cause actors to hang or restart.
2024-01-10 14:18:49 +01:00
Viktor Lofgren
d56b394bcc (control) GUI for loading external WARC files 2024-01-10 12:13:30 +01:00
Viktor Lofgren
55c9501e57 (search) Serve proper content type for static resources 2024-01-10 10:46:51 +01:00
Viktor
fad9575154
Merge pull request #69 from MarginaliaSearch/converter-optimizations
Refactor the DomainProcessor to take advantage of the new crawl data format
2024-01-10 09:46:54 +01:00
Viktor Lofgren
97e11e1ac9 (search) Fix acknowledgement page for domain complaints rendering as plain text
This was caused by incorrect usage of the renderInto() function, which was always buggy and should never be used.  This method is removed with this change.
2024-01-10 09:37:40 +01:00
Viktor Lofgren
e6a1e164b2 (search) Swap swipe direction for more consistent experience 2024-01-10 09:37:40 +01:00
Viktor Lofgren
e4f8f81e89 (search) Mobile UX improvements.
Swipe right to show filter menu.

Fix CSS bug that caused parts of the menu to not have a background.
2024-01-10 09:37:39 +01:00
Viktor Lofgren
176b3bb526 (search) Toggle for showing recent results
Actually persist the value of the toggle between searches too...
2024-01-10 09:37:39 +01:00
Viktor Lofgren
b07752fa9b (search) Toggle for showing recent results
Will by default show results from the last 2 years.  May need to tune this later.
2024-01-10 09:37:39 +01:00
Viktor Lofgren
68fd0efbde (search) Clean up search results template
Rendering is very slow. Let's see if this has a measurable effect on latency.
2024-01-10 09:37:39 +01:00
Viktor Lofgren
c80d3eb812 (search) Remove dead code 2024-01-10 09:37:35 +01:00
Viktor Lofgren
f9320995d6 (search) When clicking asn-links, show results from the unfiltered view... 2024-01-10 09:37:13 +01:00
Viktor Lofgren
f592c9f04d (search) Fix acknowledgement page for domain complaints rendering as plain text
This was caused by incorrect usage of the renderInto() function, which was always buggy and should never be used.  This method is removed with this change.
2024-01-10 09:26:34 +01:00