You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Common Crawl's WarcWriter uses a type 4 (pseudo randomly generated) UUID generated by Java's UUID.randomUUID() method.
RFC 9562 (published in 2024 and updating RFC 4122) defines a new Type 7 combining a Unix timestamp (epoch seconds) with random. This would allow to encode the capture time (WARC-Date) in the UUID used in the WARC-Record-Id
adding usable information to the record ID (e.g., for verification of the WARC-Date)
this requires that the capture time is used as timestamp and not the time the WARC record is created
a Unix timestamp in milliseconds in the most significant 48 bits, allocating the required version (4 bits) and variant (2-bits) and filling the remaining 74 bits with random bits.
Common Crawl's WarcWriter uses a type 4 (pseudo randomly generated) UUID generated by Java's UUID.randomUUID() method.
RFC 9562 (published in 2024 and updating RFC 4122) defines a new Type 7 combining a Unix timestamp (epoch seconds) with random. This would allow to encode the capture time (
WARC-Date) in the UUID used in theWARC-Record-IdNotes and links: