Skip to content

Commit 158b0b6

Browse files
feat: update README with acknowledgements and content details
Added acknowledgements section and updated contents list.
1 parent 512f05b commit 158b0b6

1 file changed

Lines changed: 5 additions & 0 deletions

File tree

README.md

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -5,5 +5,10 @@ Code and data accompanying the work described in [this blog post](https://common
55
- `ArabicDomainQuality.xlsx`: The original data received from QCRI.
66
- `arabic_seeds.ipynb`: A notebook detailing the data processing and analysis.
77
- `crawl_lang_info.tsv`: Summarised language information for the domains found in the CC-MAIN-2026-{21,17,12} archives.
8+
- `DomainQuality_Dashboard.ipynb`: Additional analysis of the quality of the pre-filtered domains, carried out by researchers at QCRI.
89

910
We use [uv](https://docs.astral.sh/uv/) to manage Python dependencies.
11+
12+
## Acknowledgements
13+
14+
Thank you to [Hamdy S. Hussein](https://elmi.hbku.edu.qa/en/persons/hamdy-soliman-mubarak-hussien/), [Dr. Kareem M. Darwish](https://kareemdarwish.com) and [Dr. Mohamed Ahmed Yassin Eltabakh](https://elmi.hbku.edu.qa/en/persons/mohamed-ahmed-yassin-eltabakh/) of the [Qatar Computing Research Institute](https://www.hbku.edu.qa/en/qcri) for providing the initial seed list, quality annotations and exploratory visualisations. These were created as part of the [Fanar Project](http://www.fanar.qa), an Arabic generative AI platform.

0 commit comments

Comments
 (0)