(warc) Update URL encoding in WarcProtocolReconstructor
The URI query string is now URL encoded in the WarcProtocolReconstructor. This change ensures proper encoding of special characters as per the standard URL encoding rules and improves URL validity during the crawling process.
This commit is contained in:
parent
68ac8d3e09
commit
0b112cb4d4
@ -26,7 +26,7 @@ public class WarcProtocolReconstructor {
|
||||
requestStringBuilder.append(request.method()).append(" ").append(encodedURL);
|
||||
|
||||
if (uri.getQuery() != null) {
|
||||
requestStringBuilder.append("?").append(uri.getQuery());
|
||||
requestStringBuilder.append("?").append(URLEncoder.encode(uri.getQuery(), StandardCharsets.UTF_8));
|
||||
}
|
||||
requestStringBuilder.append(" HTTP/1.1\r\n");
|
||||
requestStringBuilder.append("Host: ").append(uri.getHost()).append("\r\n");
|
||||
|
Loading…
Reference in New Issue
Block a user