According to [async-profiler](/jvm-profiling-tools/async-profiler), when writing WARC files the CPU time is mostly spent for: - gzip compression - SHA digests - language detection - charset detection  (interactive [SVG.zip](https://github.com/commoncrawl/nutch/files/3339651/fetcher.reduce.28673.async-prof.svg.zip)) Tasks to improve the performance: - [x] improve performance charset detection, see #7 - [x] avoid unnecessary recoding from UTF-8 to UTF-8, verify validity instead
According to async-profiler, when writing WARC files the CPU time is mostly spent for:
(interactive SVG.zip)
Tasks to improve the performance: