- Strip empty port via URLParser
- Use CharsetDetector to guess encoding of HTML documents
- Fix last header was lost if LF LF
- Make regular expression to extract URLs from CSS more restrictive
- Remove invalid constant
PROFILE_REVISIT_URI_AGNOSTIC_IDENTICAL_DIGEST
- Make canonicalizer be able to strip session id params even if they are the first params in the query string
- Store origin-code of ARC file header
- Flush output etc before tallying stats to fix sizeOnDisk calculation
- Get rid of broken, seemingly unnecessary escapeWhitespace() step of uri fixup
- Handle empty String argument in CharsetDetector.trimAttrValue
- WAT extractor: adding information in WAT's warcinfo
- WAT extractor: missing WARC format version
- WAT extractor: envelope structure does not conform to the WAT specification
- WAT extractor: WARC-Date in all records should be the WAT record generation date
- WAT extractor: WARC-Filename in the WAT warcinfo record should be the WAT filename itself
- WAT extractor: Entity-Trailing-Slop-Bytes should be called Entity-Trailing-Slop-Length
- Escape redirect URLs in RealCDXExtractorOutput
- Tests fail on Windows
- Test fails on Java 8
- RecordingOutputStream can affect tcp packets sent in an undesirable way