Datasets

MasakhaNEWS

Verified
Datasets
Language / NLP
Docs live

News-topic-classification dataset for 16 widely spoken African languages (incl. Hausa, Igbo, Yoruba, Nigerian Pidgin), ~31,088 rows in CSV/Parquet with train/val/test splits across seven topic categories. Built by the Masakhane community.

Category
Datasets
Pricing
Free / CC-BY-NC 4.0
Country
🌍 Pan-African
Last verified
24 Jun 2026

Tags

nlp
african-languages
text-classification
news
topic-classification

Compare MasakhaNEWS

Side-by-side, verified specs against its closest language / nlp alternatives.

Related in Datasets