Pan-African
11 verified resources in AI Resources for building in Pan-African.
AfriBERTa
AfriBERTa is a multilingual masked language model (XLM-RoBERTa architecture, ~126M params) pretrained from scratch on 11 African languages including Amharic, Hausa, Igbo, Swahili, and Yoruba. Built by the Castorini lab (University of Waterloo) for text classification and Named Entity Recognition on low-resource African languages.
AfriTeVa V2
An improved T5 v1.1 model (428M params) pretrained on the Wura corpus covering 16 African languages, with gains on classification, translation, summarization and cross-lingual QA. Published at EMNLP 2023 by the Castorini group with African lead authors.
AfroLID
A neural language identification toolkit that detects which of 517 African languages and varieties a text belongs to across 14 language families, reaching 97.41 macro-F1 after fine-tuning on SERENGETI. Developed by the UBC Deep Learning and NLP Lab and published at EMNLP 2022.
AfroLM
A multilingual masked language model pretrained from scratch on 23 African languages using a self-active learning framework, outperforming AfriBERTa, mBERT and XLMR-base on NER and sentiment tasks. Created by Bonaventure Dossou and collaborators, published at SustaiNLP/EMNLP 2022.
AfroXLMR
AfroXLMR is an XLM-R-large model (0.6B params) adapted to African languages via multilingual adaptive fine-tuning, covering 17 African languages plus Arabic, French, and English. Created by David Adelani (Davlan) and published at COLING 2022 for cross-lingual transfer tasks like NER.
BibleTTS
BibleTTS is a large, high-fidelity open text-to-speech corpus with up to 80+ hours of studio-quality 48kHz single-speaker recordings per language across ten Sub-Saharan African languages (Akuapem Twi, Asante Twi, Chichewa, Ewe, Hausa, Kikuyu, Lingala, Luganda, Luo, Yoruba), built by Masakhane/Coqui.
Cheetah
A massively multilingual natural language generation model supporting 517 African languages, outperforming baselines on five of seven AfroNLG tasks like summarization and translation. Developed by the UBC Deep Learning and NLP Lab and published at ACL 2024.
Deep Learning Indaba
Pan-African grassroots movement strengthening machine learning and AI across the continent through an annual conference, locally-run IndabaX events, mentorship and an Ideathon. The 2026 Indaba is hosted at Pan-Atlantic University in Lagos, Nigeria.
SERENGETI
A massively multilingual masked language model covering 517 African languages and varieties across five scripts, achieving state-of-the-art results on the AfroNLU benchmark. Developed by the UBC Deep Learning and NLP Lab as an Afrocentric resource.
Toucan
An Afrocentric many-to-many machine translation model (1.2B params, mT5) fine-tuned from Cheetah to support 156 African language pairs, evaluated on the AfroLingu-MT benchmark. Developed by the UBC Deep Learning and NLP Lab and published at ACL 2024.
Zindi
Africa's largest data science and AI competition platform where organizations host real-world ML challenges and a community of builders competes to solve them. Offers competitions, learning courses, jobs and leaderboards, with partners including Microsoft, Google, AWS and Google DeepMind.
