diff --git a/README.md b/README.md index 9d83065..134198b 100644 --- a/README.md +++ b/README.md @@ -31,9 +31,17 @@ uv run ccoa --help `ccoa classify-warc` streams WARC files from S3 (or any fsspec URL), extracts plain text from each response record with trafilatura, and -applies a HuggingFace-hosted fasttext classifier. Per-record output is a -CSV `URL,prediction_score,warc_filename,warc_record_index`; a one-shot score-distribution summary is -logged at the end and written to a `.summary.csv` file. +applies one or more HuggingFace-hosted fasttext classifiers in a single +pass. Per-record output is a CSV with one `score_