AfriQA
VerifiedCross-lingual open-retrieval question-answering dataset with human-translated QA pairs for 10 African languages (incl. Hausa, Igbo, Yoruba), totaling 12,159 examples across train/validation/test splits. From the Masakhane initiative.
- Category
- Datasets
- Pricing
- Free / CC-BY-SA 4.0
- Country
- 🌍 Pan-African
- Last verified
- 24 Jun 2026
Tags
Compare AfriQA
Side-by-side, verified specs against its closest language / nlp alternatives.
Related in Datasets
MasakhaNER 2.0
Largest high-quality named-entity-recognition corpus for 20 African languages (incl. Nigerian Pidgin, Hausa, Igbo, Yoruba) with PER/ORG/LOC/DATE tags over news-domain text, totaling ~152,786 rows. Built by the Masakhane community.
MasakhaNEWS
News-topic-classification dataset for 16 widely spoken African languages (incl. Hausa, Igbo, Yoruba, Nigerian Pidgin), ~31,088 rows in CSV/Parquet with train/val/test splits across seven topic categories. Built by the Masakhane community.
MasakhaPOS
Part-of-speech tagging dataset for 20 African languages (incl. Nigerian Pidgin, Hausa, Igbo, Yoruba) using Universal Dependencies tags, with per-language train/validation/test splits. Built by the Masakhane community.
