Nigerian Pidgin ASR (nigerian-pidgin-1.0)
VerifiedSpeech-to-text corpus for Nigerian Pidgin English: 4,277 quality-filtered 16kHz WAV recordings with sentence-level transcriptions from 10 native speakers, split train 2,710 / val 677 / test 892, ~956 MB.
- Category
- Datasets
- Pricing
- Free / CC-BY 4.0
- Country
- 🇳🇬 Nigeria
- Last verified
- 24 Jun 2026
Tags
Compare Nigerian Pidgin ASR (nigerian-pidgin-1.0)
Side-by-side, verified specs against its closest speech alternatives.
Related in Datasets
AfriSpeech-200
Pan-African accented English speech corpus of ~200 hours covering 120 African accents from 13 countries and 2,463 speakers across clinical and general domains, with per-accent configs. Released by Intron Health.
Yoruba Speech-Text Parallel Corpus
Large Yoruba parallel speech-text corpus of 1,647,022 audio-text pairs (~21.5 GB, WAV) aligned with the MMS-300M Forced Aligner for ASR and TTS, with clips of 0.04-12 seconds.
AfriSpeech-Dialog
Conversational African-accented speech corpus (~6 hours) of 50 two-speaker dialogues across 11 accents (Hausa, Yoruba, Igbo, Swahili, Sesotho and others) from Nigeria, Kenya and South Africa, for ASR and speaker diarization. By Intron Health.
