Zabantu-XLM-Roberta

AI Resources

NLP Model

Docs live

Zabantu is a family of XLM-RoBERTa masked language models (roughly 80M to 250M params) trained from scratch on South African Bantu languages including Tshivenda, Zulu, Xhosa, Swati, Northern and Southern Sotho, Setswana and Xitsonga. It serves as a benchmark for low-resource Bantu language NLP. It was built by the Data Science for Social Impact group at the University of Pretoria.

Website

Category: AI Resources
Pricing: Open weights
Country: 🇿🇦 South Africa
Last verified: 5 Jul 2026

Compare Zabantu-XLM-Roberta

Side-by-side, verified specs against its closest nlp model alternatives.

Zabantu-XLM-Roberta vs AfriBERTa Zabantu-XLM-Roberta vs AfriTeVa V2 Zabantu-XLM-Roberta vs AfroLID Zabantu-XLM-Roberta vs AfroXLMR

See all verified ai resources in South Africa

Related in AI Resources

AfriBERTa

AfriBERTa is a multilingual masked language model (XLM-RoBERTa architecture, ~126M params) pretrained from scratch on 11 African languages including Amharic, Hausa, Igbo, Swahili, and Yoruba. Built by the Castorini lab (University of Waterloo) for text classification and Named Entity Recognition on low-resource African languages.

Docs live

NLP Model

Verified Jul 2026

Cheetah

A massively multilingual natural language generation model supporting 517 African languages, outperforming baselines on five of seven AfroNLG tasks like summarization and translation. Developed by the UBC Deep Learning and NLP Lab and published at ACL 2024.

Docs live

Institutional only

NLP Model

Verified Jul 2026Free for research use

AfriTeVa V2

An improved T5 v1.1 model (428M params) pretrained on the Wura corpus covering 16 African languages, with gains on classification, translation, summarization and cross-lingual QA. Published at EMNLP 2023 by the Castorini group with African lead authors.

Docs live

NLP Model

Verified Jul 2026Free / open weights

Tags

Compare Zabantu-XLM-Roberta

Related in AI Resources

AfriBERTa

Cheetah

AfriTeVa V2