Datasets

Yoruba Speech-Text Parallel Corpus

Verified
Datasets
Speech
Docs live

Large Yoruba parallel speech-text corpus of 1,647,022 audio-text pairs (~21.5 GB, WAV) aligned with the MMS-300M Forced Aligner for ASR and TTS, with clips of 0.04-12 seconds.

Category
Datasets
Pricing
Free / CC-BY 4.0
Country
🇳🇬 Nigeria
Last verified
24 Jun 2026

Tags

speech
tts
asr
yoruba
parallel-corpus

Compare Yoruba Speech-Text Parallel Corpus

Side-by-side, verified specs against its closest speech alternatives.

See all verified datasets in Nigeria

Related in Datasets