Viktor Lofgren
b89633ae4b
(search) Don't render a filter button on mobile when there are no filters to be presented.
2023-12-02 17:23:45 +01:00
Viktor Lofgren
96357e9bfd
(search) Fix typeahead suggestions, as well as improve mobile and desktop UX in small ways.
2023-12-02 17:06:40 +01:00
Viktor Lofgren
d530c3096f
(search) GUI tweaks to make the new interface not fall apart on mobile/chrome
2023-12-02 17:06:40 +01:00
Viktor Lofgren
ae0c1c3f2d
(control) Adjust search result margins for better visual density
2023-12-02 17:06:40 +01:00
Viktor Lofgren
0cc2564380
(search) CSS tweaks
2023-12-02 17:06:40 +01:00
Viktor Lofgren
38d20022ad
(search) Fix script loading for mobile support
2023-12-02 17:06:40 +01:00
Viktor Lofgren
280132dad0
(search) Fix script loading for mobile support
2023-12-02 17:06:40 +01:00
Viktor Lofgren
61de4e2789
(search) Retain filter options when performing a new search from the input field
2023-12-02 17:06:40 +01:00
Viktor Lofgren
f9d3455320
(search) Reduce visual weight of search results
2023-12-02 17:06:40 +01:00
Viktor Lofgren
2ff64c3c12
(search) New toggle for reducing tracking
2023-12-02 17:06:40 +01:00
Viktor Lofgren
902f235b5b
(search) Integrate 'similar' tab in site info.
2023-12-02 17:06:40 +01:00
Viktor Lofgren
97d43a6fa2
(search) Revamp browse results with new look.
2023-12-02 17:06:40 +01:00
Viktor Lofgren
9bc65ff0ca
(search) Desaturate search result titles according to rank
2023-12-02 17:06:40 +01:00
Viktor Lofgren
6cd6a615fd
(search) Add data-filter to body as a data attribute
...
For future shenanigans ;D
2023-12-02 17:06:40 +01:00
Viktor Lofgren
5639f0653d
(search) Rename SearchProfile.name into filterId
...
Avoid foot-gun caused by name clash with the Enumeration method name(), which returns the Java name of the enumeration value.
2023-12-02 17:06:40 +01:00
Viktor Lofgren
251174c9a2
(search) Update front page with new look
2023-12-02 17:06:40 +01:00
Viktor Lofgren
42ea87d637
(search) Update conversion results, error page, and dictionary results with new CSS.
2023-12-02 17:06:40 +01:00
Viktor Lofgren
7c8a60b8cf
(search) Site info view is mostly done
...
Also optimize the rendering a bit to avoid having to allocate huge string buffers, writing directly to Spark's response instead.
2023-12-02 17:06:40 +01:00
Viktor Lofgren
2f4500be5a
(search) New frontend look
2023-12-02 17:06:40 +01:00
Viktor Lofgren
fa7534a362
(search) Remove dead code
2023-12-02 17:06:40 +01:00
Viktor Lofgren
a258f0af7a
(search) Refactor search parameters to include query
2023-12-02 17:06:40 +01:00
Viktor Lofgren
01621c6344
(renderer) Make helpers configurable on a by-service basis.
2023-12-02 17:06:40 +01:00
Viktor Lofgren
c7934342a6
(control) Automatic recrawl
2023-12-02 17:06:24 +01:00
Viktor Lofgren
f5c324c06b
(minor) Fix broken test
2023-12-01 17:44:39 +01:00
Viktor Lofgren
f615cf2391
(convert) Loosen up the rules enforcement for documents that have external links.
2023-12-01 17:44:29 +01:00
Viktor Lofgren
e5d274fe1c
(docs) Improve architectural documentation
2023-11-30 21:38:57 +01:00
Viktor Lofgren
166a391eae
(docs) Improve architectural documentation for the crawler.
2023-11-30 21:30:57 +01:00
Viktor Lofgren
5fb24bb27f
(docs) Improve architectural documentation for the converter.
2023-11-30 20:43:22 +01:00
Viktor Lofgren
5a5430b383
(convert) Wiki specialization that should do a better job at removing junk keywords and providing a useful summary.
2023-11-30 20:04:46 +01:00
Viktor Lofgren
67a1e1c874
(control) GUI for triggering control-side actors
2023-11-29 15:31:14 +01:00
Viktor Lofgren
4155fbe94c
(control) Reprocess-all actor
2023-11-28 17:58:48 +01:00
Viktor Lofgren
347fe6b7be
(control) Reindex-all actor
2023-11-28 16:41:09 +01:00
Viktor Lofgren
ff3ceb981e
(control) Button for removing a stale 'NEW' status
...
If a process is violently terminated, the associated file storage may get stuck in the ephemeral 'NEW' state, preventing future operations on the associated data.
To remedy this without having to dig through the database, a button was added to reset the state. It's a band-aid, but the situation is rare enough that I think it's fine.
2023-11-28 15:18:24 +01:00
Viktor Lofgren
1dafa0c74d
(mqapi/control) Repair repartition endpoint, deprecate notify endpoints.
...
The repartition endpoint was mis-addressing its mqapi notifications, omitting the proper nodeId. In fixing this, it became apparent that having both @MqRequest and @MqNotification is a serious footgun, and the two should be unified into a single API where the caller isn't burdened with knowledge of the remote end's implementation specifics.
2023-11-27 16:01:12 +01:00
Viktor Lofgren
09917837d0
(process) Ensure construction exceptions are logged
...
Wrapping these exceptions in a try-catch and logging them with slf4j will ensure they end up in the process logs.
The way it worked using the default exception handler, they'd print on console (which nothing captures!), leading to a very annoying debugging experience.
2023-11-22 18:32:06 +01:00
Viktor Lofgren
dd507a3808
(db) Fix migrations, bump flyway to 10.0.1
...
Tricky problem, creating a procedure apparently needs delimiter shenanigans in Flyway, otherwise it will truncate the END statement and mariadb will be sad.
2023-11-21 20:04:35 +01:00
Viktor Lofgren
dd9406d0ac
(control) Make storage type tabs consistent
...
This had fallen off in the Create New Specification view, it lacked Exports.
2023-11-17 11:26:45 +01:00
Viktor Lofgren
f58a9f46be
(loader) Don't truncate the entire links table on load
...
This behavior is an old vestige from the days of only having a single loader process. We'd truncate the links table because doing inserts/updates was too slow. This was also important because we had 32 bit ID, and there's a lot of links between domains to go around...
Instead we delete the rows associated with the current node with a stored procedure PURGE_LINKS_TABLE.
We also update the PRIMARY KEY to a BIGINT. We'll need to load the data in excess of billion times to hit an ID rollover, so it'll be fine.
2023-11-16 10:30:12 +01:00
Viktor Lofgren
1cbf23e7e7
(test) Don't fail test if atags.parquet is not in ~vlofgren
2023-11-15 09:11:38 +01:00
Viktor Lofgren
63554ba171
(explore2) Add robots.txt
2023-11-14 09:15:32 +01:00
Viktor Lofgren
5de37cb820
(converter) Set feature flags appropriately on stackexchange posts
2023-11-12 15:48:08 +01:00
Viktor Lofgren
e5cee1f46d
(sideload) Fix sideloading so that it doesn't get disproportionately good rankings
...
Also add type flags so that e.g. wikipedia shows up in the wikis filter.
2023-11-12 14:57:57 +01:00
Viktor Lofgren
e9a01caa5c
(index) Fix broken metrics
2023-11-11 12:53:47 +01:00
Viktor Lofgren
858357a246
(metrics) Get prometheus up out of disrepair
...
* Fix bad labels
* Add nodeId where appropriate
* Hopefully fix histogram buckets for index query times
2023-11-08 14:01:28 +01:00
Viktor Lofgren
7aa2f80117
(domain) id.au should be treated as a TLD
2023-11-06 19:07:47 +01:00
Viktor Lofgren
7617b4cbc2
(crawler) Fix NPE in crawler caused by not having fetched the domains list yet
2023-11-06 18:16:38 +01:00
Viktor Lofgren
e0c769fd19
(converter) Integrate atags.parquet with the encyclopedia sideloader
...
Also clean up stackexchange and dirtree a bit.
2023-11-06 18:03:01 +01:00
Viktor Lofgren
ebd10a5f28
(crawler) Integrate atags.parquet with the crawler so that "important" URLs are prioritized
2023-11-06 16:14:58 +01:00
Viktor Lofgren
2b77184281
(converter) Integrate atags with the topology field
2023-11-06 13:46:44 +01:00
Viktor Lofgren
e23976f6c4
(search) Fix card title overflow
2023-11-06 13:25:39 +01:00
Viktor Lofgren
0b8dc02eba
(result-ranking) Nudge up results with ngram matches a tiny bit
2023-11-06 13:14:22 +01:00
Viktor Lofgren
fde1d0677e
(search) Remove unnecessary dependencies
2023-11-06 12:56:32 +01:00
Viktor Lofgren
48986574ae
(result-ranking) Use a weighted calculation of priority term importance
2023-11-06 12:56:21 +01:00
Viktor Lofgren
c7a6a71d07
(result-ranking) Use a weighted calculation of priority term importance
2023-11-06 12:48:23 +01:00
Viktor Lofgren
1847845151
Revert "(loader) Optimize INSERT statements"
...
This reverts commit 7cb92195d1
.
2023-11-04 19:32:02 +01:00
Viktor Lofgren
7cb92195d1
(loader) Optimize INSERT statements
...
INSERT IGNORE is too slow.
2023-11-04 17:43:55 +01:00
Viktor Lofgren
72afa0341f
duckdb connection may need to be synchronized?
2023-11-04 14:30:25 +01:00
Viktor Lofgren
0152004c42
Initial Commit Anchor Tags
...
* Added new (optional) model file in $WMSA_HOME/data/atags.parquet
* Converter gets a component for creating a projection of its domains onto the full atags parquet file
* New WordFlag ExternalLink
* These terms are also for now flagged as title words
* Fixed a bug where Title words aliased with UrlDomain words
* Fixed a bug in the encyclopedia sideloader that gave everything too high topology ranking
2023-11-04 14:24:17 +01:00
Viktor Lofgren
8e9698c9a0
(control/search) Add ability to suggest removing a site from random exploration
...
This is what most complaints have been about.
2023-11-02 15:29:49 +01:00
Viktor Lofgren
3047e2dd7c
(screenshot-capture-tool) Make screenshot-capture-tool cooperate with docker
2023-11-01 16:38:55 +01:00
Viktor Lofgren
a8b9d21f2d
(executor) Refine atag export logic
...
* Remove obviously uninteresting tags
* Omit URL schema for more sensible sorting
* Change the column order to put the source domain last
2023-11-01 13:23:14 +01:00
Viktor Lofgren
c77a5b7cb6
(control) GUI for atags export
2023-10-31 17:55:47 +01:00
Viktor Lofgren
23f2068e33
(executor) Actor for exporting anchor tag data from crawl data
2023-10-31 17:32:34 +01:00
Viktor Lofgren
ffadfb4149
(control) Use a partial template for the storage types tabs.
2023-10-31 17:12:14 +01:00
Viktor Lofgren
b7e38cfbae
(control) Add exports view
2023-10-31 17:08:48 +01:00
Viktor Lofgren
659743b39c
(executor) Export Data actor allocates its own storage
2023-10-31 17:04:07 +01:00
Viktor Lofgren
69758c5859
(control) Nicer redirects acknowledging actions
2023-10-31 16:26:29 +01:00
Viktor Lofgren
81bfd7e5fb
(experiment) Utility for exporting atags
2023-10-31 16:10:21 +01:00
Viktor Lofgren
8f74dbdbb4
(crawler) Set more lenient parameters for recrawl
2023-10-30 11:35:30 +01:00
Viktor Lofgren
fd5a7eac87
(crawler) Exit crawler retriever on thread interrupted
2023-10-30 11:34:16 +01:00
Viktor Lofgren
6bac3c75cb
(api) API documentation
2023-10-29 16:13:21 +01:00
Viktor Lofgren
5d6e0e3790
(log) Clean up logging
...
Don't log the PROCESS stream to executor's logs, as it will also be logged in the spawned process' log files.
Also tell the spawned process which "service" it is so that it gets a log file with a name that makes sense.
2023-10-29 15:52:17 +01:00
Viktor Lofgren
2871a326e6
(ctrl/exe) Clean up UX and code
2023-10-29 14:09:39 +01:00
Viktor Lofgren
abb42f0f36
(crawler) Fix bug in SQL statement
...
Arguments were in the wrong order in inserting fetching sites submitted to be crawled
2023-10-29 13:19:17 +01:00
Viktor Lofgren
f6fcb04817
(experiment) Repair the experiment runner
2023-10-27 16:16:50 +02:00
Viktor Lofgren
88f49834fd
(docs) Update documentation
2023-10-27 12:45:39 +02:00
Viktor Lofgren
4415f52e18
(keyword-extraction) Fix broken test
2023-10-27 12:19:33 +02:00
Viktor Lofgren
98d742d634
(actor) Code cleanup
2023-10-27 12:19:20 +02:00
Viktor Lofgren
6c1ca10be7
(minor) code cleanup
2023-10-27 11:38:37 +02:00
Viktor Lofgren
aeaf2d546a
(search) Fix broken redirect for flagging problems with websites
2023-10-27 11:20:49 +02:00
Viktor Lofgren
c7cb6664b4
(control) Indicate missing services with danger-color instead of having a distracting and constantly updating last-seen number
2023-10-26 18:05:22 +02:00
Viktor Lofgren
79adba9284
(index) Fix bug in dealing with quoted search terms
2023-10-26 16:28:23 +02:00
Viktor Lofgren
37b7f52f2c
(minor) Reduce log severity for getTermMeta miss
2023-10-26 15:41:52 +02:00
Viktor Lofgren
c89e0ab255
(minor) Disable ~vlofgren specific debug test
2023-10-26 15:27:59 +02:00
Viktor Lofgren
f613f4f2df
(array) Fix spurious search results
...
This was caused by a bug in the binary search algorithm causing it to sometimes return positive values when encoding a search miss.
It was also necessary to get rid of the vestiges of the old LongArray and IntArray classes to make this fix doable.
2023-10-26 15:27:02 +02:00
Viktor Lofgren
a497e4c920
(crawler) Terminate crawler after a few hours of no progress
2023-10-26 12:49:28 +02:00
Viktor Lofgren
0f637fb722
(logging) Better logging configurations
2023-10-26 12:48:10 +02:00
Viktor Lofgren
abbadc92a0
(exdecutor) Prevent TriggerAdjacencyCalculationActor from showing up in the actions tab when it isn't running
2023-10-25 21:25:07 +02:00
Viktor Lofgren
97fcbdd6d9
(control) Move storage actions into the actions tab
...
* Also disable annoying CSS animations
2023-10-25 21:23:56 +02:00
Viktor Lofgren
d7686b665e
Refactoring
...
* Encyclopedia sideloader; permit providing base URL.
* Storage base shows node id in GUI
* ProcessLivenessMonitorActor restarts automatically
* Clean-up of outbox code
2023-10-25 18:51:02 +02:00
Viktor Lofgren
5de41a3a7f
(search-service) Show node affinity in site info tab
2023-10-25 12:44:48 +02:00
Viktor Lofgren
84cdac83d6
(control) Move message queue monitor to control
2023-10-24 16:44:28 +02:00
Viktor Lofgren
436a55ee1e
(control) Render UUID tooltip with dashes.
2023-10-24 16:37:40 +02:00
Viktor Lofgren
313cc2965c
(index-creation) Print whether full or prio is created
...
Previous state of saying reverse index for both was pretty confusing.
2023-10-24 16:23:10 +02:00
Viktor Lofgren
95f74c5ea7
(control) Filter out heartbeats that are stopped
2023-10-24 16:09:28 +02:00
Viktor Lofgren
8d1c3c754d
Testing development flow with adding a ~tilde search filter
2023-10-24 15:35:15 +02:00
Viktor Lofgren
72152f9d80
Fix bug in handling js parameters
2023-10-24 15:10:02 +02:00
Viktor Lofgren
ebd365a128
Fix exception
2023-10-24 15:04:12 +02:00
Viktor Lofgren
0406e76889
(api) Remove logging cruft
2023-10-24 13:39:05 +02:00
Viktor Lofgren
c2b28c0f8d
(api) Trial streaming API
2023-10-24 13:26:46 +02:00