(converter) Refactor content type check in PlainTextDocumentProcessorPlugin

The method `isApplicable` in the `PlainTextDocumentProcessorPlugin` was refactored to handle a wider range of content types beyond merely "text/plain". It now also handles any content type that starts with "text/plain;", to accomodate contentTypes that append a charset as well.
This commit is contained in:
Viktor Lofgren 2024-01-22 17:52:14 +01:00
parent 51cdf46645
commit 41d896ba3e

View File

@ -54,7 +54,14 @@ public class PlainTextDocumentProcessorPlugin extends AbstractDocumentProcessorP
@Override @Override
public boolean isApplicable(CrawledDocument doc) { public boolean isApplicable(CrawledDocument doc) {
return doc.contentType.equalsIgnoreCase("text/plain"); String contentType = doc.contentType.toLowerCase();
if (contentType.equals("text/plain"))
return true;
if (contentType.startsWith("text/plain;")) // charset=blabla
return true;
return false;
} }
@Override @Override