Skip to content

WARC writer: use URI.toASCIIString() instead of URI.toString() #20

Description

@sebastian-nagel

The WARC writer should use URI.toASCIIString() instead of URI.toString(). The URI class deviates from RFC 2396 and does allow non-control Unicode characters. Many WARC tools require URI compliant to RFC 2396. See commoncrawl/ia-web-commons#27 how this bug was detected.

Metadata

Metadata

Labels

No labels
No labels

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions