Datasets

MasakhaNER 2.0

Verified
Datasets
Language / NLP
Docs live

Largest high-quality named-entity-recognition corpus for 20 African languages (incl. Nigerian Pidgin, Hausa, Igbo, Yoruba) with PER/ORG/LOC/DATE tags over news-domain text, totaling ~152,786 rows. Built by the Masakhane community.

Category
Datasets
Pricing
Free / CC-BY-NC 4.0
Country
🌍 Pan-African
Last verified
24 Jun 2026

Tags

nlp
ner
named-entity-recognition
african-languages
token-classification

Compare MasakhaNER 2.0

Side-by-side, verified specs against its closest language / nlp alternatives.

Related in Datasets