Datasets

AfriQA vs MAFAND-MT (masakhane/mafand)

A verified, side-by-side comparison. Both records are status-checked by Findra, so you are comparing what each actually offers today, not a stale listing.

Category
Datasets
Datasets
Type
Language / NLP
Language / NLP
Country
🌍 Pan-African
🌍 Pan-African
Docs status
Docs live
Docs live
Licensing
Pricing
Free / CC-BY-SA 4.0
Free / open (CC BY-NC 4.0)
Verified
Verified
Unverified
Last verified
5 Jul 2026
5 Jul 2026
Tags
nlp, african-languages, question-answering, cross-lingual, open-retrieval
nlp, african-languages, news, machine-translation, parquet
Summary
Cross-lingual open-retrieval question-answering dataset with human-translated QA pairs for 10 African languages (incl. Hausa, Igbo, Yoruba), totaling 12,159 examples across train/validation/test splits. From the Masakhane initiative.
Largest news-domain machine translation benchmark for African languages, covering 21 languages with English or French as source. It contains 142,909 parallel sentences in parquet with train, dev and test splits, hosted on HuggingFace. Licensed CC BY-NC 4.0.