ARCRecord encapsulates streaming of record content, hardening against parsing mistakes. Unfortunately HTTP-headers are processed on the raw ARC-stream, allowing parsing of problematic headers to stream into other records, as demonstrated in issue #40. This results in aborted iteration of ARC records.
A more conservative approach would be to wrap the record block (HTTP-headers + content) as soon as possible; right after the ARC-entry-header-line has been parsed. Extraction of HTTP-headers would then take place within the hardened stream, isolating any parsing or data problems to the current record.
ARCRecord encapsulates streaming of record content, hardening against parsing mistakes. Unfortunately HTTP-headers are processed on the raw ARC-stream, allowing parsing of problematic headers to stream into other records, as demonstrated in issue #40. This results in aborted iteration of ARC records.
A more conservative approach would be to wrap the record block (HTTP-headers + content) as soon as possible; right after the ARC-entry-header-line has been parsed. Extraction of HTTP-headers would then take place within the hardened stream, isolating any parsing or data problems to the current record.