Skip to content

Latest commit

 

History

History
296 lines (223 loc) · 13.1 KB

File metadata and controls

296 lines (223 loc) · 13.1 KB

Unreleased

3.0.2 (2025-11-14)

Fixes

  • Avoid relying on the default locale or charset. #128
  • BasicURLCanonicalizer: more efficient normalization of dots in host names. #129

Dependency upgrades

  • commons-cli: 1.10.0 → 1.11.0
  • commons-codec: 1.19.0 → 1.20.0
  • commons-io: 2.20.0 → 2.21.0
  • junit-jupiter: 5.13.3 → 5.14.1
  • maven-release-plugin: 3.1.1 → 3.2.0

3.0.1 (2025-10-27)

Fixes

  • Fixed a file handle leak in FileUtils.pagedLines() and FileUtils.appendTo() that could occur during I/O errors.

Dependency Upgrades

  • commons-codec: 1.18.0 → 1.19.0
  • commons-lang3: 3.18.0 → 3.19.0
  • commons-cli: 1.9.0 → 1.10.0
  • guava: 33.4.8-jre → 33.5.0-jre
  • hadoop: 3.4.1 → 3.4.2
  • pig: 0.17.0 → 0.18.0

3.0.0 (2025-07-21)

Changes

FileUtils.pagedLines() and FileUtils.expandRange() now return the Apache Commons Lang 3 version of LongRange. Users of these methods may need to make the following changes:

Old New
import org.apache.commons.lang.math.LongRange import org.apache.commons.lang3.LongRange
new LongRange(min, max) LongRange.of(min, max)
longRange.getMaximumLong() longRange.getMaximum()
longRange.getMinimumLong() longRange.getMinimum()

Dependency upgrades

  • commons-io: 2.19.0 → 2.20.0
  • commons-lang: 2.6 → 3.18.0

2.0.2 (2025-07-15)

Fixes

  • Fixes for org.archive.net.PublicSuffixes #110
    • Updated to the latest version of the public suffix list.
    • Fixed parsing failures with newer list versions.
    • Moved effective_tld_names.dat to org/archive/effective_tld_names.dat to prevent conflict with crawler-commons.

2.0.1 (2025-05-21)

Changes

  • Re-added Reporter.shortReportLineTo(PrintWriter) as it turned out to be important to Heritrix.

2.0.0 (2025-05-21)

New features

  • Added RecordingInputStream.asOutputStream() for direct writing of recorded data without an input stream. #108

Removals

Removed Apache HttpClient 3.1

HTTPSeekableLineReaderFactory and ZipNumBlockLoader now default to HttpClient 4.3.

Removed Replacement
org.apache.commons.httpclient.URIException org.archive.url.URIException
org.apache.commons.httpclient.Header org.archive.format.http.HttpHeader
org.archive.httpclient.HttpRecorderGetMethod
org.archive.httpclient.HttpRecorderMethod
org.archive.httpclient.HttpRecorderPostMethod
org.archive.httpclient.SingleHttpConnectionManager
org.archive.httpclient.ThreadLocalHttpConnectionManager

Removed deprecated versions of renamed classes

Removed Replacement
org.archive.io.ArchiveFileConstants org.archive.format.ArchiveFileConstants
org.archive.io.GzipHeader org.archive.util.zip.GzipHeader
org.archive.io.GZIPMembersInputStream org.archive.util.zip.GZIPMembersInputStream
org.archive.io.NoGzipMagicException org.archive.util.zip.NoGzipMagicException
org.archive.io.arc.ARCConstants org.archive.format.arc.ARCConstants
org.archive.io.warc.WARCConstants org.archive.format.warc.WARCConstants
org.archive.url.DefaultIACanonicalizerRules org.archive.url.AggressiveIACanonicalizerRules
org.archive.url.DefaultIAURLCanonicalizer org.archive.url.AggressiveIAURLCanonicalizer
org.archive.url.GoogleURLCanonicalizer org.archive.url.BasicURLCanonicalizer

Removed deprecated methods

Removed Replacement
ANVLRecord(int) ANVLRecord()
DevUtils.betterPrintStack(RuntimeException) Throwable.printStackStrace()
Recorder.getReplayCharSequence() Recorder.getContentReplayCharSequence()
Reporter.shortReportLineTo(PrintWriter) Reporter.reportTo(PrintWriter)
Removed usages of constant interfaces

Static imports should be used instead.

  • ArchiveFileConstants is no longer implemented by:
    • ArchiveReader
    • ArchiveReaderFactory
    • WARCWriter
    • WriterPool
    • WriterPoolMember
  • ARCConstants is no longer implemented by:
    • ARCReader
    • ARCReaderFactory
    • ARCRecord
    • ARCRecordMetaData
    • ARCUtils
    • ARCWriter
  • WARCConstants is no longer implemented by:
    • WARCReader
    • WARCReaderFactory
    • WARCRecord
    • WARCWriter

Dependency upgrades

  • commons-io: 2.18.0 → 2.19.0
  • guava: 33.3.1-jre → 33.4.8-jre
  • json: 20240303 → 20250517
  • junit: 4.13.2 → 5.12.2

1.3.0 (2024-12-20)

URL Canonicalization Changed

The output of WaybackURLKeyMaker and other canonicalizers based on BasicURLCanonicalizer has changed for URLs that contain non UTF-8 percent encoded sequences. For example when a URL contains "%C3%23" it will now be normalised to "%c3%23" whereas previous releases produced "%25c3%23". This change brings webarchive-commons more inline with pywb, surt (Python), warcio.js and RFC 3986. While CDX file compatibility with these newer tools should improve, note that CDX files generated by the new release which contain such URLs may not work correctly with existing versions of OpenWayback that use the older webarchive-commons. #102

Bug fixes

  • WAT: Duplicated payload metadata values for "Actual-Content-Length" and "Trailing-Slop-Length" #103
  • ObjectPlusFilesOutputStream.hardlinkOrCopy now uses Files.createLink() instead of executing ln. This prevents the potential for security vulnerabilities from command line option injection and improves portability.

Dependency upgrades

  • fastutil removed
  • dsiutils removed

Deprecations

The following classes and enum members have been marked deprecated as a step towards removal of the dependency on Apache Commons HttpClient 3.1.

  • org.archive.httpclient.HttpRecorderGetMethod
  • org.archive.httpclient.HttpRecorderMethod
  • org.archive.httpclient.HttpRecorderPostMethod
  • org.archive.httpclient.SingleHttpConnectionManager
  • org.archive.httpclient.ThreadLocalHttpConnectionManager
  • org.archive.util.binsearch.impl.http.ApacheHttp31SLR
  • org.archive.util.binsearch.impl.http.ApacheHttp31SLRFactory
  • org.archive.util.binsearch.impl.http.HTTPSeekableLineReaderFactory.HttpLibs.APACHE_31

1.2.0 (2024-11-29)

New features

  • MetaData is now multivalued to support repeated WARC and HTTP headers. #98

Dependency upgrades

  • commons-io 2.18.0
  • commons-lang 2.6
  • guava 33.3.1-jre
  • hadoop 3.4.1
  • htmlparser 2.1
  • httpcore 4.4.16
  • json 20240303
  • junit 4.13.2

1.1.11 (2024-11-27)

Bug fixes

  • Fixed URLParser and WaybackURLKeyMaker failing on URLs with IPv6 address hostnames #100

1.1.10 (2024-10-15)

Bug fixes

Dependency Upgrades

  • commons-collections 3.2.2
  • commons-io 2.7
  • dsiutils 2.2.8
  • guava 33.3.0-jre
  • hadoop 3.4.0 (now optional)
  • pig 0.17.0
  • org.json 20231013

Dependency Removals

  • joda-time (was unused)

1.1.9

1.1.8

1.1.7

1.1.6

1.1.5

1.1.4

1.1.3

1.1.2

1.1.1