Skip to content

WaybackURLKeyMaker to keep non-utf8 percent encodings #6

@sebastian-nagel

Description

@sebastian-nagel

WaybackURLKeyMaker.makeKey(url) replaces percent signs by %25 in percent-encoded URL with bytes not representing valid utf-8 encoded characters (before RFC 3986):

http://www.aluroba.com/tags/%C3%CE%CA%C7%D1%E5%C7.htm
-> com,aluroba)/tags/%25c3%25ce%25ca%25c7%25d1%25e5%25c7.htm
https://1kr.ua/newslist.html?tag=%E4%EE%F8%EA%EE%EB%FC%ED%EE%E5
-> ua,1kr)/newslist.html?tag=%25e4%25ee%25f8%25ea%25ee%25eb%25fc%25ed%25ee%25e5

Python's surt module behaves different which breaks look-up in CDX files for such URLs.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions