According to the WAT specification (https://webarchive.jira.com/wiki/display/Iresearch/Web+Archive+Metadata+File+Specification), the enveloppe structure should be:
"Envelope": {
"Format": "WARC",
"Payload-Metadata": {}
"WARC-Header-Length": "298",
"WARC-Header-Metadata": {}
}
In the WAT files generated with the extractor, we have the following structure:
Envelope: {
Format: "WARC",
WARC-Header-Length: "298",
Actual-Content-Length": "1343",
WARC-Header-Metadata: {},
Block-Digest: "sha1:XW7VSE74YCSE6AIJNT5AVSELMVBCIYYN",
Payload-Metadata: {}
}
Block-Digest and Actual-Content-Length are not supposed to be in this section.
There are also an Actual-Content-Length and a Entity-Digest in the Payload-Metadata section.
Content and computation of these 4 metadata need to be clarified.
According to the WAT specification (https://webarchive.jira.com/wiki/display/Iresearch/Web+Archive+Metadata+File+Specification), the enveloppe structure should be:
"Envelope": {
"Format": "WARC",
"Payload-Metadata": {}
"WARC-Header-Length": "298",
"WARC-Header-Metadata": {}
}
In the WAT files generated with the extractor, we have the following structure:
Envelope: {
Format: "WARC",
WARC-Header-Length: "298",
Actual-Content-Length": "1343",
WARC-Header-Metadata: {},
Block-Digest: "sha1:XW7VSE74YCSE6AIJNT5AVSELMVBCIYYN",
Payload-Metadata: {}
}
Block-Digest and Actual-Content-Length are not supposed to be in this section.
There are also an Actual-Content-Length and a Entity-Digest in the Payload-Metadata section.
Content and computation of these 4 metadata need to be clarified.