Zabantu-XLM-Roberta
Zabantu is a family of XLM-RoBERTa masked language models (roughly 80M to 250M params) trained from scratch on South African Bantu languages including Tshivenda, Zulu, Xhosa, Swati, Northern and Southern Sotho, Setswana and Xitsonga. It serves as a benchmark for low-resource Bantu language NLP. It was built by the Data Science for Social Impact group at the University of Pretoria.
- Category
- AI Resources
- Pricing
- Open weights
- Country
- 馃嚳馃嚘 South Africa
- Last verified
- 5 Jul 2026
Tags
Compare Zabantu-XLM-Roberta
Side-by-side, verified specs against its closest nlp model alternatives.
Related in AI Resources
AfriBERTa
AfriBERTa is a multilingual masked language model (XLM-RoBERTa architecture, ~126M params) pretrained from scratch on 11 African languages including Amharic, Hausa, Igbo, Swahili, and Yoruba. Built by the Castorini lab (University of Waterloo) for text classification and Named Entity Recognition on low-resource African languages.
Cheetah
A massively multilingual natural language generation model supporting 517 African languages, outperforming baselines on five of seven AfroNLG tasks like summarization and translation. Developed by the UBC Deep Learning and NLP Lab and published at ACL 2024.
AfriTeVa V2
An improved T5 v1.1 model (428M params) pretrained on the Wura corpus covering 16 African languages, with gains on classification, translation, summarization and cross-lingual QA. Published at EMNLP 2023 by the Castorini group with African lead authors.
