IPv4-mapped / IPv4-compatible IPv6 addresses (e.g., ::ffff:192.0.2.128) in URLs are mangled by WaybackURLKeyMaker: the enclosing square brackets are not removed, but moved around together with the parts of the host-port combination after splitting at dots:
jshell> import org.archive.url.WaybackURLKeyMaker;
jshell> var km = new WaybackURLKeyMaker();
jshell> km.makeKey("http://[::ffff:123.123.87.87]:8080/index.html")
$3 ==> "87],87,123,[::ffff:123:8080)/index.html"
For comparison, the Python surt module removes the square brackets before splitting at dots and moving reversing the parts:
$> pip3 show surt
Name: surt
Version: 0.3.1
Summary: Sort-friendly URI Reordering Transform (SURT) python package.
$> python3
Python 3.12.3 (main, Nov 6 2024, 18:32:19) [GCC 13.2.0] on linux
>>> from surt import surt
>>> surt("http://[::ffff:123.123.87.87]:8080/index.html")
'87,87,123,::ffff:123:8080)/index.html'
I'm not sure, what the best representation is:
- normalize the IPv4-mapped representation -
::ffff:123.123.87.87 becomes
::ffff:7b7b:5757
- or
123.123.87.87
- the double use of the colon in IPv6 addresses and as port separator is troublesome, but maybe not an issue, because SURT keys are recall-oriented and some ambiguity is acceptable. It'd be also a separate issue.
IPv4-mapped / IPv4-compatible IPv6 addresses (e.g.,
::ffff:192.0.2.128) in URLs are mangled by WaybackURLKeyMaker: the enclosing square brackets are not removed, but moved around together with the parts of the host-port combination after splitting at dots:For comparison, the Python surt module removes the square brackets before splitting at dots and moving reversing the parts:
I'm not sure, what the best representation is:
::ffff:123.123.87.87becomes::ffff:7b7b:5757123.123.87.87