Datasets
AfriQA vs MasakhaPOS
A verified, side-by-side comparison. Both records are status-checked by Findra, so you are comparing what each actually offers today, not a stale listing.
Category
Datasets
Datasets
Type
Language / NLP
Language / NLP
Country
π Pan-African
π Pan-African
Docs status
Docs live
Docs live
Licensing
Pricing
Free / CC-BY-SA 4.0
Free / CC-BY-NC 4.0
Verified
Verified
Verified
Last verified
24 Jun 2026
24 Jun 2026
Tags
nlp, african-languages, question-answering, cross-lingual, open-retrieval
nlp, african-languages, token-classification, pos-tagging, ud-tags
Summary
Cross-lingual open-retrieval question-answering dataset with human-translated QA pairs for 10 African languages (incl. Hausa, Igbo, Yoruba), totaling 12,159 examples across train/validation/test splits. From the Masakhane initiative.
Part-of-speech tagging dataset for 20 African languages (incl. Nigerian Pidgin, Hausa, Igbo, Yoruba) using Universal Dependencies tags, with per-language train/validation/test splits. Built by the Masakhane community.