MAFAND-MT (masakhane/mafand)
Largest news-domain machine translation benchmark for African languages, covering 21 languages with English or French as source. It contains 142,909 parallel sentences in parquet with train, dev and test splits, hosted on HuggingFace. Licensed CC BY-NC 4.0.
- Category
- Datasets
- Pricing
- Free / open (CC BY-NC 4.0)
- Country
- 馃實 Pan-African
- Last verified
- 5 Jul 2026
Tags
Compare MAFAND-MT (masakhane/mafand)
Side-by-side, verified specs against its closest language / nlp alternatives.
Related in Datasets
Hausa Visual Genome (HausaVG)
Multimodal Hausa-English dataset of 32,923 images with paired English/Hausa region descriptions (train/dev/test/challenge splits), post-edited by HausaNLP and Bayero University Kano translators for English-to-Hausa machine translation and image description.
AfriQA
Cross-lingual open-retrieval question-answering dataset with human-translated QA pairs for 10 African languages (incl. Hausa, Igbo, Yoruba), totaling 12,159 examples across train/validation/test splits. From the Masakhane initiative.
MasakhaNER 2.0
Largest high-quality named-entity-recognition corpus for 20 African languages (incl. Nigerian Pidgin, Hausa, Igbo, Yoruba) with PER/ORG/LOC/DATE tags over news-domain text, totaling ~152,786 rows. Built by the Masakhane community.
