AI registry
AI Resources
African language models, speech & text datasets, and AI infrastructure.
33 results
NaijaML
Production-ready NLP library (v0.2.1, Mar 2026) covering all 4 Nigerian languages, running in 4GB RAM with no GPU, with PII masking for NIN/phone. The most practical Nigerian AI resource for developers.
YarnGPT2b
Nigerian-accented English TTS and ASR model (Jan 2025) with 11 voices, trained on Nollywood and podcast audio.
AfriBERTa
AfriBERTa is a multilingual masked language model (XLM-RoBERTa architecture, ~126M params) pretrained from scratch on 11 African languages including Amharic, Hausa, Igbo, Swahili, and Yoruba. Built by the Castorini lab (University of Waterloo) for text classification and Named Entity Recognition on low-resource African languages.
AfriTeVa V2
An improved T5 v1.1 model (428M params) pretrained on the Wura corpus covering 16 African languages, with gains on classification, translation, summarization and cross-lingual QA. Published at EMNLP 2023 by the Castorini group with African lead authors.
Africa GPU Hub
First Nigerian GPU rental marketplace (Udutech, Lagos), offering GPU cloud compute at under $1/hr.
African-Whisper
An open-source framework (PyPI: africanwhisper) for fine-tuning OpenAI's Whisper on multilingual African-language audio datasets such as Common Voice and FLEURS, with optimized inference, diarization and deployment. Created by Kevin Kibe.
AfroLID
A neural language identification toolkit that detects which of 517 African languages and varieties a text belongs to across 14 language families, reaching 97.41 macro-F1 after fine-tuning on SERENGETI. Developed by the UBC Deep Learning and NLP Lab and published at EMNLP 2022.
AfroLM
A multilingual masked language model pretrained from scratch on 23 African languages using a self-active learning framework, outperforming AfriBERTa, mBERT and XLMR-base on NER and sentiment tasks. Created by Bonaventure Dossou and collaborators, published at SustaiNLP/EMNLP 2022.
AfroXLMR
AfroXLMR is an XLM-R-large model (0.6B params) adapted to African languages via multilingual adaptive fine-tuning, covering 17 African languages plus Arabic, French, and English. Created by David Adelani (Davlan) and published at COLING 2022 for cross-lingual transfer tasks like NER.
Awarri
Awarri is a Lagos-based AI and robotics company that built N-ATLaS, Nigeria's first government-backed open-source multilingual LLM (Llama-3 8B fine-tuned on ~392M tokens of English, Hausa, Igbo and Yoruba), in partnership with NCAIR and the Federal Ministry of Communications, Innovation & Digital Economy. It also operates the LangEasy data-collection platform.
BibleTTS
BibleTTS is a large, high-fidelity open text-to-speech corpus with up to 80+ hours of studio-quality 48kHz single-speaker recordings per language across ten Sub-Saharan African languages (Akuapem Twi, Asante Twi, Chichewa, Ewe, Hausa, Kikuyu, Lingala, Luganda, Luo, Yoruba), built by Masakhane/Coqui.
Cassava Technologies AI Factory
Pan-African NVIDIA Cloud Partner deploying GPU-as-a-service and AI-as-a-service from data centers across South Africa, Nigeria, Kenya, Egypt and Morocco, having procured 12,000 NVIDIA GPUs with the first cluster operational in Cape Town. It offers localized model support via its CAIMEx multi-model exchange.
Cheetah
A massively multilingual natural language generation model supporting 517 African languages, outperforming baselines on five of seven AfroNLG tasks like summarization and translation. Developed by the UBC Deep Learning and NLP Lab and published at ACL 2024.
Deep Learning Indaba
Pan-African grassroots movement strengthening machine learning and AI across the continent through an annual conference, locally-run IndabaX events, mentorship and an Ideathon. The 2026 Indaba is hosted at Pan-Atlantic University in Lagos, Nigeria.
EqualyzAI
EqualyzAI is a voice-first agentic AI company building speech recognition, text-to-speech and voice agents for African languages and dialects (Yoruba, Igbo, Hausa, Pidgin with code-switching), with products including VoiceMaker, VoiceAgent and VoiceBridge plus datasets/APIs. Operates from Lagos, Nigeria and Washington DC.
GhanaNLP ABENA
ABENA (A BERT Now in Akan) is a family of BERT, DistilBERT and RoBERTa language models for the Twi/Akan language covering both Asante and Akuapem dialects, released by the open-source GhanaNLP initiative. Distinct from GhanaNLP's Khaya translation product.
GhanaNLP Khaya
GhanaNLP is an open-source Ghana-based NLP initiative whose Khaya product offers machine translation, text-to-speech and automatic speech recognition for 10+ Ghanaian languages (Twi, Ewe, Ga, Dagbani, Frafra and others) via web/Android/iOS apps and a REST API with Python and JavaScript SDKs.
InkubaLM
InkubaLM-0.4B is a 400M-parameter open-weights small language model built from scratch by Lelapa AI for five low-resource African languages (isiZulu, Yoruba, Swahili, isiXhosa, Hausa, plus English/French), using a LLaMA-style architecture trained on 2.4B tokens.
InstaDeep
African-founded (Tunis, 2014) enterprise AI company delivering deep and reinforcement learning decision-making systems across biology (DeepChain), logistics (DeepPack) and electronics (DeepPCB). Acquired by BioNTech in 2023 and retains African offices in Tunis, Lagos, Cape Town and Kigali.
Intron Health
Intron Health is a Nigerian voice-AI company whose Sahara-v2 models deliver clinical/medical speech recognition (and TTS) optimized natively for African accents and dialects, trained on Africa's largest clinical speech database (millions of clips across 200+ accents). Serves healthcare, call-centre, legal and biometrics use cases via STT/TTS/voice-bot APIs.
Kallaama
A 125-hour transcribed speech dataset in Wolof, Pulaar and Sereer (the three most widely spoken languages of Senegal) focused on agriculture, built for ASR development. Led by Jokalante with Orange Innovation and Ecole Polytechnique de Thies, funded by Lacuna Fund.
Lanfrica
Lanfrica is a catalog/registry mapping African language resources (datasets, models, papers and policies) via its African AI Atlas, positioning itself as 'the evidence layer for African AI.' Built by Lanfrica Labs with partners including Meta, Mozilla, Masakhane and Lacuna Fund.
LangEasy
LangEasy is Awarri's crowdsourced data-collection platform (smartphone app) that lets anyone contribute voice and text in Nigerian languages (Yoruba, Hausa, Igbo, Ibibio, Pidgin and accented English) to build the training data behind Nigeria's national LLM, N-ATLaS.
Lesan AI
Machine translation service for Ethiopian and Eritrean languages including Amharic, Tigrinya, Oromo and Somali, offering document translation and a translation API. Founded 2019, it reports outperforming Google Translate on its supported pairs.
