Skip to content

WAT extractor: WARC-Date in all records should be the WAT record generation date #43

@saraaubry

Description

@saraaubry

In the current implementation and according to the WAT specification (https://webarchive.jira.com/wiki/display/Iresearch/Web+Archive+Metadata+File+Specification), WARC-Date is a 14-digit timestamp that represents the instant of data capture of the primary content.

The WARC ISO standard states WARC-Date is mandatory and corresponds to:
"A 14-digit UTC timestamp formatted according to YYYY-MM-DDThh:mm:ssZ, described in the W3C profile of ISO8601 [W3CDTF]. The timestamp shall represent the instant that data capture for record creation began. Multiple records written as part of a single capture event (see section 5.7) shall use the same WARC-Date, even though the times of their writing will not be exactly synchronized."

WAT records are metadata records which can be created long after the data capture, and with different kind of processing tools. WARC-Date in all metadata records should be the WAT record generation date.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions