CatgirlIntelligenceAgency

Author	SHA1	Message	Date
Viktor Lofgren	c92f1b8df8	(geo-ip) Revert removal of ip2location logic We do both ip2location and ASN data. The change also adds some keywords based on autonomous system information, on a somewhat experimental basis. It would be neat to be able to e.g. exclude cloud services or just e.g. cloudflare from the search results.	2023-12-17 15:03:00 +01:00
Viktor Lofgren	bde68ba48b	Merge branch 'master' into asn-info	2023-12-17 14:00:23 +01:00
Viktor Lofgren	5ab2a22e88	(search) Fix result count back down to 1 per domain	2023-12-17 13:14:23 +01:00
Viktor Lofgren	d7bd540683	(*) Replace the ip2location IP geolocation data with ASN information from apnic.net. Doesn't really make sense to use ip2location as a middle man for information that is already freely available...	2023-12-16 21:55:04 +01:00
Viktor Lofgren	d715b1f9ca	(search) Improve error handling in search parameters parsing The code now intercepts and deals with potential exceptions during the parsing of search parameters. This is in response to constant bad requests from bots which were cluttering the logs. A catch clause is added that suppresses these errors and redirects to the base URL.	2023-12-16 18:42:13 +01:00
Viktor Lofgren	6f2bf38f0e	(index) Fix off-by-1 error in the domain count limiter	2023-12-16 16:57:05 +01:00
Viktor Lofgren	320882c34a	(site-info) Try to discover the schema of the website with a site:-query The site info view can't blindly assume that every website supports https. To figure out which schema to use when linking to a site, execute a single-result search for site:domain.name and then grab the schema off the result. To allow this, a count parameter is introduced to doSiteSearch() in SearchOperator.	2023-12-16 16:34:53 +01:00
Viktor Lofgren	f655ec5a5c	(*) Refactor GeoIP-related code In this commit, GeoIP-related classes are refactored and relocated to a common library as they are shared across multiple services. The crawler is refactored to enable the GeoIpBlocklist to use the new GeoIpDictionary as the base of its decisions. The converter is modified ot query this data to add a geoip:-keyword to documents to permit limiting a search to the country of the hosting server. The commit also adds due BY-SA attribution in the search engine footer for the source of the IP geolocation data.	2023-12-10 17:30:43 +01:00
Viktor Lofgren	91dd45cf64	(search) IP and IP geolocation in site info view This commit also fixes a bug in the loader where the IP field wouldn't always populate as intended, and refactors the DomainInformationService to use significantly fewer SQL queries.	2023-12-09 20:06:55 +01:00
Viktor Lofgren	37af60254f	(search) Better recipe filter Tune the recipe filter to give better results, by using the 'popular' domains set along with excluding results with heavy tracking.	2023-12-09 20:06:55 +01:00
Viktor Lofgren	f0e736d4ea	(search) Update the search profile 'Academia' to strictly filter on academic tlds The previous version used a personalized pagerank centering on a few academic domains, but this didn't work very well and most results were not very academia-centric.	2023-12-09 20:06:55 +01:00
Viktor Lofgren	e3ebb0c5bb	(*) Rename the search filter 'RETRO' into 'POPULAR' This will make the terminology more consistent between the GUI and the code. The rankings yaml still uses 'retro' though, for to retain compatibility.	2023-12-09 20:06:54 +01:00
Viktor Lofgren	6382f779c3	(search) Revert back to using 'Popular' as the default search filter Unfiltered is a bit too ... unfiltered, and gives a bad first impression for many queries.	2023-12-09 16:34:12 +01:00
Viktor Lofgren	8ef34883a8	(search) Move site information out of the search service and into assistant. This reduces the impact of restarting the search service, as the site information takes a few minutes to load during which it's not available. It also permits exposing this information via API in the future if there is interest in this. The assistant service was also modified to do a late load of the suggestions trie, as this is a major contributor to its start-up time. Finally, some changes were made to the client library, a new get() method was added that takes a TypeToken to allow deserialization of generics such as List<Foo>, and the scheduler was also modified to use virtual threads.	2023-12-09 16:30:06 +01:00
Viktor Lofgren	156c067f79	(search) Fix mobile issues with browse feature	2023-12-05 21:28:50 +01:00
Viktor Lofgren	b33b013d41	(search) Fix broken script tag Apparently it can't be called suggestions.js...?	2023-12-05 20:29:13 +01:00
Viktor Lofgren	e74e2f705f	(search) Fix broken script tag suggestions.js became something else.	2023-12-05 20:20:07 +01:00
Viktor Lofgren	2e438847fc	(search) Optimize related domains queries In the future this logic probably needs to move into a separate service, as it's still quite slow to load. But this fixes response times and DOS potential of previous version.	2023-12-05 20:12:03 +01:00
Viktor Lofgren	9301c47d93	(search) Optimize related domains queries	2023-12-05 14:42:03 +01:00
Viktor Lofgren	20ec58b07f	(search) Remove layout-breakingly long URLs from the similar domains view. They're almost all .onion URLs anyway, not really the space we're looking to peer into.	2023-12-05 13:58:15 +01:00
Viktor Lofgren	98983c1015	(search) Hopefully fix race condition that leaves the response with no Content-type header	2023-12-05 13:52:36 +01:00
Viktor Lofgren	67195592c6	(search) Hopefully fix race condition that leaves the response with no Content-type header	2023-12-05 13:48:42 +01:00
Viktor Lofgren	d1e88df71e	(search) Cleaning up the code a bit	2023-12-05 13:26:05 +01:00
Viktor Lofgren	f36cfe34ab	(search) Hackery to get a more balanced view	2023-12-04 22:50:39 +01:00
Viktor Lofgren	8a1934008c	(search) Merge similar sites results with the info view. WIP: This commit needs to be cleaned up.	2023-12-04 22:10:24 +01:00
Viktor Lofgren	b41bb9cfcf	(search) Use a Ξ for mobile button title instead of "Filters". Makes it easier to distinguish form the search button.	2023-12-03 16:33:25 +01:00
Viktor Lofgren	d58324bbef	(search) Clean up filters menu a bit, improve accessibility.	2023-12-02 18:05:30 +01:00
Viktor Lofgren	cbbd45d3e5	(search) Clean up filters menu a bit, improve accessibility.	2023-12-02 18:01:03 +01:00
Viktor Lofgren	b89633ae4b	(search) Don't render a filter button on mobile when there are no filters to be presented.	2023-12-02 17:23:45 +01:00
Viktor Lofgren	96357e9bfd	(search) Fix typeahead suggestions, as well as improve mobile and desktop UX in small ways.	2023-12-02 17:06:40 +01:00
Viktor Lofgren	d530c3096f	(search) GUI tweaks to make the new interface not fall apart on mobile/chrome	2023-12-02 17:06:40 +01:00
Viktor Lofgren	ae0c1c3f2d	(control) Adjust search result margins for better visual density	2023-12-02 17:06:40 +01:00
Viktor Lofgren	0cc2564380	(search) CSS tweaks	2023-12-02 17:06:40 +01:00
Viktor Lofgren	38d20022ad	(search) Fix script loading for mobile support	2023-12-02 17:06:40 +01:00
Viktor Lofgren	280132dad0	(search) Fix script loading for mobile support	2023-12-02 17:06:40 +01:00
Viktor Lofgren	61de4e2789	(search) Retain filter options when performing a new search from the input field	2023-12-02 17:06:40 +01:00
Viktor Lofgren	f9d3455320	(search) Reduce visual weight of search results	2023-12-02 17:06:40 +01:00
Viktor Lofgren	2ff64c3c12	(search) New toggle for reducing tracking	2023-12-02 17:06:40 +01:00
Viktor Lofgren	902f235b5b	(search) Integrate 'similar' tab in site info.	2023-12-02 17:06:40 +01:00
Viktor Lofgren	97d43a6fa2	(search) Revamp browse results with new look.	2023-12-02 17:06:40 +01:00
Viktor Lofgren	9bc65ff0ca	(search) Desaturate search result titles according to rank	2023-12-02 17:06:40 +01:00
Viktor Lofgren	6cd6a615fd	(search) Add data-filter to body as a data attribute For future shenanigans ;D	2023-12-02 17:06:40 +01:00
Viktor Lofgren	5639f0653d	(search) Rename SearchProfile.name into filterId Avoid foot-gun caused by name clash with the Enumeration method name(), which returns the Java name of the enumeration value.	2023-12-02 17:06:40 +01:00
Viktor Lofgren	251174c9a2	(search) Update front page with new look	2023-12-02 17:06:40 +01:00
Viktor Lofgren	42ea87d637	(search) Update conversion results, error page, and dictionary results with new CSS.	2023-12-02 17:06:40 +01:00
Viktor Lofgren	7c8a60b8cf	(search) Site info view is mostly done Also optimize the rendering a bit to avoid having to allocate huge string buffers, writing directly to Spark's response instead.	2023-12-02 17:06:40 +01:00
Viktor Lofgren	2f4500be5a	(search) New frontend look	2023-12-02 17:06:40 +01:00
Viktor Lofgren	fa7534a362	(search) Remove dead code	2023-12-02 17:06:40 +01:00
Viktor Lofgren	a258f0af7a	(search) Refactor search parameters to include query	2023-12-02 17:06:40 +01:00
Viktor Lofgren	01621c6344	(renderer) Make helpers configurable on a by-service basis.	2023-12-02 17:06:40 +01:00

1 2

70 Commits