AI Resources

NaijaVoices Dataset

AI Resources
Speech Dataset
Docs gated
Approval required

NaijaVoices is a large-scale speech dataset of about 1,800 hours from over 5,000 speakers with expert-curated transcripts in Igbo, Hausa and Yoruba, roughly 600 hours per language. It is designed for building ASR and speech AI for Nigerian languages and improves Whisper and MMS fine-tuning performance. It is available on HuggingFace behind a free registration and powers models like AfriHuBERT and SBPN.

Category
AI Resources
Pricing
Free for non-commercial research (registration required)
Country
馃嚦馃嚞 Nigeria
Last verified
5 Jul 2026

Tags

yoruba
hausa
igbo
nigerian-languages
speech-dataset
See all verified ai resources in Nigeria