Pan-African
31 verified resources for building in Pan-African.
AfriBERTa
AfriBERTa is a multilingual masked language model (XLM-RoBERTa architecture, ~126M params) pretrained from scratch on 11 African languages including Amharic, Hausa, Igbo, Swahili, and Yoruba. Built by the Castorini lab (University of Waterloo) for text classification and Named Entity Recognition on low-resource African languages.
AfriQA
Cross-lingual open-retrieval question-answering dataset with human-translated QA pairs for 10 African languages (incl. Hausa, Igbo, Yoruba), totaling 12,159 examples across train/validation/test splits. From the Masakhane initiative.
AfriSpeech-200
Pan-African accented English speech corpus of ~200 hours covering 120 African accents from 13 countries and 2,463 speakers across clinical and general domains, with per-accent configs. Released by Intron Health.
AfriSpeech-Dialog
Conversational African-accented speech corpus (~6 hours) of 50 two-speaker dialogues across 11 accents (Hausa, Yoruba, Igbo, Swahili, Sesotho and others) from Nigeria, Kenya and South Africa, for ASR and speaker diarization. By Intron Health.
AfriTeVa V2
An improved T5 v1.1 model (428M params) pretrained on the Wura corpus covering 16 African languages, with gains on classification, translation, summarization and cross-lingual QA. Published at EMNLP 2023 by the Castorini group with African lead authors.
AfroLID
A neural language identification toolkit that detects which of 517 African languages and varieties a text belongs to across 14 language families, reaching 97.41 macro-F1 after fine-tuning on SERENGETI. Developed by the UBC Deep Learning and NLP Lab and published at EMNLP 2022.
AfroLM
A multilingual masked language model pretrained from scratch on 23 African languages using a self-active learning framework, outperforming AfriBERTa, mBERT and XLMR-base on NER and sentiment tasks. Created by Bonaventure Dossou and collaborators, published at SustaiNLP/EMNLP 2022.
AfroXLMR
AfroXLMR is an XLM-R-large model (0.6B params) adapted to African languages via multilingual adaptive fine-tuning, covering 17 African languages plus Arabic, French, and English. Created by David Adelani (Davlan) and published at COLING 2022 for cross-lingual transfer tasks like NER.
BibleTTS
BibleTTS is a large, high-fidelity open text-to-speech corpus with up to 80+ hours of studio-quality 48kHz single-speaker recordings per language across ten Sub-Saharan African languages (Akuapem Twi, Asante Twi, Chichewa, Ewe, Hausa, Kikuyu, Lingala, Luganda, Luo, Yoruba), built by Masakhane/Coqui.
Cheetah
A massively multilingual natural language generation model supporting 517 African languages, outperforming baselines on five of seven AfroNLG tasks like summarization and translation. Developed by the UBC Deep Learning and NLP Lab and published at ACL 2024.
CinetPay MCP
The official CinetPay MCP server lets AI assistants initialize payments, check payment and transfer status, run balance checks and money transfers across ten Francophone African countries. Exposes tools such as initialize_payment, check_payment_status, create_transfer and list_payment_methods.
DPO Group (DPO Pay)
Direct Pay Online (now DPO Pay by Network) is a pan-African payment service provider in 20+ countries, with a REST DPO Pay API v6, email payment links, and plugins for WooCommerce, Shopify, Magento and Odoo. Sandbox credentials and Postman collections provided.
Deep Learning Indaba
Pan-African grassroots movement strengthening machine learning and AI across the continent through an annual conference, locally-run IndabaX events, mentorship and an Ideathon. The 2026 Indaba is hosted at Pan-Atlantic University in Lagos, Nigeria.
MasakhaNER 2.0
Largest high-quality named-entity-recognition corpus for 20 African languages (incl. Nigerian Pidgin, Hausa, Igbo, Yoruba) with PER/ORG/LOC/DATE tags over news-domain text, totaling ~152,786 rows. Built by the Masakhane community.
MasakhaNEWS
News-topic-classification dataset for 16 widely spoken African languages (incl. Hausa, Igbo, Yoruba, Nigerian Pidgin), ~31,088 rows in CSV/Parquet with train/val/test splits across seven topic categories. Built by the Masakhane community.
MasakhaPOS
Part-of-speech tagging dataset for 20 African languages (incl. Nigerian Pidgin, Hausa, Igbo, Yoruba) using Universal Dependencies tags, with per-language train/validation/test splits. Built by the Masakhane community.
Onafriq
Formerly MFS Africa, Onafriq is a pan-African cross-border digital payments network connecting businesses to ~1 billion mobile wallets across 40+ African markets, offering collections, disbursements, card issuing/processing and agent banking. Developer portal is the former Beyonic API docs.
PawaPay
Mobile money aggregator offering a single REST API and dashboard to collect deposits, send payouts and process refunds across ~20 African countries, covering roughly 85% of mobile money on the continent. Asynchronous financial APIs with sandbox and live environments plus a Postman collection.
Pesapal
East African payment gateway (Kenya, Uganda, Tanzania and beyond) offering API 3.0, a REST/JSON API for online payments, with sample code repositories, plugins and a developer forum; the older XML-based API 2.0 is deprecated. PCI/DSS compliant with sandbox test credentials.
Pngme
Pngme is a financial data infrastructure platform aggregating bank, loan and airtime data across emerging markets including Nigeria, Kenya, Ghana, Uganda and Zambia. Its API delivers machine-learning-ready financial features, alert labels and a synthetic data store for credit scoring and risk assessment.
SERENGETI
A massively multilingual masked language model covering 517 African languages and varieties across five scripts, achieving state-of-the-art results on the AfroNLU benchmark. Developed by the UBC Deep Learning and NLP Lab as an Afrocentric resource.
SendByte MCP
The official SendByte MCP server lets AI agents send email and manage sending domains, templates, deliverability, analytics and content checks through SendByte, an African email infrastructure provider. Published to npm as @sendbyte/mcp.
Tingg by Cellulant
Cellulant's Tingg is a single-API payments platform for collections and payouts across 24+ African markets, supporting mobile money, cards and bank transfers, plus engagement APIs for transactional SMS/OTP. Developers register on the Tingg developer portal for sandbox and production access.
Toucan
An Afrocentric many-to-many machine translation model (1.2B params, mT5) fine-tuned from Cheetah to support 156 African language pairs, evaluated on the AfroLingu-MT benchmark. Developed by the UBC Deep Learning and NLP Lab and published at ACL 2024.



