AfriHuBERT vs BibleTTS

A verified, side-by-side comparison. Both records are status-checked by Findra, so you are comparing what each actually offers today, not a stale listing.

AfriHuBERT BibleTTS

Tags

speech, african-languages, self-supervised, hubert, speech-encoder

dataset, speech, tts, african-languages, masakhane

Links

Website Docs GitHub

Summary

AfriHuBERT is a compact self-supervised speech representation model based on mHuBERT-147, continually pretrained via multilingual adaptive finetuning on over 10,000 hours of speech spanning more than 1,200 African languages and varieties. It improves spoken language identification and ASR over its base model and acts as an encoder for downstream African speech tasks. Its training data was aggregated from sources including BibleTTS, Kallaama, NaijaVoices and NCHLT.

BibleTTS is a large, high-fidelity open text-to-speech corpus with up to 80+ hours of studio-quality 48kHz single-speaker recordings per language across ten Sub-Saharan African languages (Akuapem Twi, Asante Twi, Chichewa, Ewe, Hausa, Kikuyu, Lingala, Luganda, Luo, Yoruba), built by Masakhane/Coqui.

Full details: AfriHuBERT Full details: BibleTTS