(sideload) Just index based on first paragraph

This seems like it would make the wikipedia search result worse, but it drastically improves the result quality!

This is because wikipedia has a lot of articles that each talk about a lot of irrelevant concepts, and indexing the entire document means tangentially relevant results tend to displace the most relevant results.
This commit is contained in:
Viktor Lofgren 2024-01-01 16:19:38 +01:00
parent f6fa8bd722
commit faa50bf578

View File

@ -120,6 +120,7 @@ public class EncyclopediaMarginaliaNuSideloader implements SideloadSource, AutoC
fullHtml.append("<p>");
fullHtml.append(part);
fullHtml.append("</p>");
break; // Only take the first part, this improves accuracy a lot
}
fullHtml.append("</div></body></html>");