Skip to content

Reduce log level of two classes called by the WAT/WET extractor to avoid that log files are flooded with multiple log messages per WARC record#33

Merged
jnioche merged 1 commit into
masterfrom
reduce-log-level-to-avoid-superflous-logging
Oct 31, 2023
Merged

Reduce log level of two classes called by the WAT/WET extractor to avoid that log files are flooded with multiple log messages per WARC record#33
jnioche merged 1 commit into
masterfrom
reduce-log-level-to-avoid-superflous-logging

Conversation

@sebastian-nagel

Copy link
Copy Markdown

The logging of the classes ExtractingResourceProducer and GZIPSeriesMember is very verbose and produces per transformed WARC record multiple log messages:

Oct 07, 2023 5:41:49 PM org.archive.format.gzip.GZIPMemberSeries returnBytes
INFO: Returned (3165)bytes
Oct 07, 2023 5:41:49 PM org.archive.format.gzip.GZIPMemberSeries read
INFO: read(8 bytes) bufferSize(3165)
Oct 07, 2023 5:41:49 PM org.archive.format.gzip.GZIPMemberSeries getNextMember
INFO: getNextMember
Oct 07, 2023 5:41:49 PM org.archive.format.gzip.GZIPMemberSeries read
INFO: read(3 bytes) bufferSize(3157)
Oct 07, 2023 5:41:49 PM org.archive.format.gzip.GZIPMemberSeries getNextMember
INFO: AlignedResult:0
Oct 07, 2023 5:41:49 PM org.archive.format.gzip.GZIPMemberSeries read
INFO: read(7 bytes) bufferSize(3154)
Oct 07, 2023 5:41:49 PM org.archive.format.gzip.GZIPMemberSeries getNextMember
INFO: Read next GZip header...
Oct 07, 2023 5:41:49 PM org.archive.extract.ExtractingResourceProducer getNext
INFO: Extracting (class org.archive.resource.warc.WARCResource) with (class org.archive.resource.http.HTTPResponseResourceFactory)

These messages generate 40+ MB of log output per WARC file (about 1 GiB in size). To avoid that log files are flooded, this PR changes the log level for these outputs from INFO to FINE. The level for messages which might indicate potential reasons for errors are left as is.

to avoid that log files are flooded with multiple log messages
per WARC record
@jnioche jnioche merged commit d75823f into master Oct 31, 2023
@jnioche jnioche deleted the reduce-log-level-to-avoid-superflous-logging branch October 31, 2023 09:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants