AI registry
AI Resources
African language models, speech & text datasets, and AI infrastructure.
5 results in LLM
AfroLM
A multilingual masked language model pretrained from scratch on 23 African languages using a self-active learning framework, outperforming AfriBERTa, mBERT and XLMR-base on NER and sentiment tasks. Created by Bonaventure Dossou and collaborators, published at SustaiNLP/EMNLP 2022.
InkubaLM
InkubaLM-0.4B is a 400M-parameter open-weights small language model built from scratch by Lelapa AI for five low-resource African languages (isiZulu, Yoruba, Swahili, isiXhosa, Hausa, plus English/French), using a LLaMA-style architecture trained on 2.4B tokens.
N-ATLaS
Nigeria's first government-backed multilingual LLM (Sep 2025): a Llama-3 8B fine-tuned on 400M+ tokens across 4 Nigerian languages. Produced by NCAIR/NITDA and Awarri.
SERENGETI
A massively multilingual masked language model covering 517 African languages and varieties across five scripts, achieving state-of-the-art results on the AfroNLU benchmark. Developed by the UBC Deep Learning and NLP Lab as an Afrocentric resource.
UlizaLlama (Jacaranda Health)
UlizaLlama is a 7B-parameter Swahili-and-English LLM fine-tuned from Meta's Llama 2 (continually pretrained on ~321M Swahili tokens) by Jacaranda Health in Kenya, built to power Swahili maternal-health SMS support for low-income expectant mothers in East Africa.
