AfriBERTa vs AfriTeVa V2

A verified, side-by-side comparison. Both records are status-checked by Findra, so you are comparing what each actually offers today, not a stale listing.

AfriBERTa AfriTeVa V2

Tags

nlp, multilingual, african-languages, low-resource, masked-language-model

nlp, african-languages, t5, text-to-text, wura-corpus

Links

Website Docs GitHub

Summary

AfriBERTa is a multilingual masked language model (XLM-RoBERTa architecture, ~126M params) pretrained from scratch on 11 African languages including Amharic, Hausa, Igbo, Swahili, and Yoruba. Built by the Castorini lab (University of Waterloo) for text classification and Named Entity Recognition on low-resource African languages.

An improved T5 v1.1 model (428M params) pretrained on the Wura corpus covering 16 African languages, with gains on classification, translation, summarization and cross-lingual QA. Published at EMNLP 2023 by the Castorini group with African lead authors.

Full details: AfriBERTa Full details: AfriTeVa V2