AfriHuBERT
AfriHuBERT is a compact self-supervised speech representation model based on mHuBERT-147, continually pretrained via multilingual adaptive finetuning on over 10,000 hours of speech spanning more than 1,200 African languages and varieties. It improves spoken language identification and ASR over its base model and acts as an encoder for downstream African speech tasks. Its training data was aggregated from sources including BibleTTS, Kallaama, NaijaVoices and NCHLT.
- Category
- AI Resources
- Pricing
- Open weights
- Country
- 馃實 Pan-African
- Last verified
- 5 Jul 2026
Tags
Compare AfriHuBERT
Side-by-side, verified specs against its closest speech (asr/tts) alternatives.
Related in AI Resources
Kallaama
A 125-hour transcribed speech dataset in Wolof, Pulaar and Sereer (the three most widely spoken languages of Senegal) focused on agriculture, built for ASR development. Led by Jokalante with Orange Innovation and Ecole Polytechnique de Thies, funded by Lacuna Fund.
African-Whisper
An open-source framework (PyPI: africanwhisper) for fine-tuning OpenAI's Whisper on multilingual African-language audio datasets such as Common Voice and FLEURS, with optimized inference, diarization and deployment. Created by Kevin Kibe.
EqualyzAI
EqualyzAI is a voice-first agentic AI company building speech recognition, text-to-speech and voice agents for African languages and dialects (Yoruba, Igbo, Hausa, Pidgin with code-switching), with products including VoiceMaker, VoiceAgent and VoiceBridge plus datasets/APIs. Operates from Lagos, Nigeria and Washington DC.
