NaijaVoices Dataset
AI Resources
Speech Dataset
Docs gated
Approval required
NaijaVoices is a large-scale speech dataset of about 1,800 hours from over 5,000 speakers with expert-curated transcripts in Igbo, Hausa and Yoruba, roughly 600 hours per language. It is designed for building ASR and speech AI for Nigerian languages and improves Whisper and MMS fine-tuning performance. It is available on HuggingFace behind a free registration and powers models like AfriHuBERT and SBPN.
- Category
- AI Resources
- Pricing
- Free for non-commercial research (registration required)
- Country
- 馃嚦馃嚞 Nigeria
- Last verified
- 5 Jul 2026
Tags
yoruba
hausa
igbo
nigerian-languages
speech-dataset