AfroLID vs Zabantu-XLM-Roberta

A verified, side-by-side comparison. Both records are status-checked by Findra, so you are comparing what each actually offers today, not a stale listing.

AfroLID Zabantu-XLM-Roberta

Tags

nlp, african-languages, 517-languages, ubc-nlp, language-identification

south-africa, xlm-roberta, bantu-languages, tshivenda, zulu

Links

Website Docs GitHub

Website

Summary

A neural language identification toolkit that detects which of 517 African languages and varieties a text belongs to across 14 language families, reaching 97.41 macro-F1 after fine-tuning on SERENGETI. Developed by the UBC Deep Learning and NLP Lab and published at EMNLP 2022.

Zabantu is a family of XLM-RoBERTa masked language models (roughly 80M to 250M params) trained from scratch on South African Bantu languages including Tshivenda, Zulu, Xhosa, Swati, Northern and Southern Sotho, Setswana and Xitsonga. It serves as a benchmark for low-resource Bantu language NLP. It was built by the Data Science for Social Impact group at the University of Pretoria.

Full details: AfroLID Full details: Zabantu-XLM-Roberta