Datasets

MasakhaPOS

Verified
Datasets
Language / NLP
Docs live

Part-of-speech tagging dataset for 20 African languages (incl. Nigerian Pidgin, Hausa, Igbo, Yoruba) using Universal Dependencies tags, with per-language train/validation/test splits. Built by the Masakhane community.

Category
Datasets
Pricing
Free / CC-BY-NC 4.0
Country
🌍 Pan-African
Last verified
24 Jun 2026

Tags

nlp
african-languages
token-classification
pos-tagging
ud-tags

Compare MasakhaPOS

Side-by-side, verified specs against its closest language / nlp alternatives.

Related in Datasets