AfriBERTa vs AfroLID

A verified, side-by-side comparison. Both records are status-checked by Findra, so you are comparing what each actually offers today, not a stale listing.

AfriBERTa AfroLID

Tags

nlp, multilingual, african-languages, low-resource, masked-language-model

nlp, african-languages, 517-languages, ubc-nlp, language-identification

Links

Website Docs GitHub

Summary

AfriBERTa is a multilingual masked language model (XLM-RoBERTa architecture, ~126M params) pretrained from scratch on 11 African languages including Amharic, Hausa, Igbo, Swahili, and Yoruba. Built by the Castorini lab (University of Waterloo) for text classification and Named Entity Recognition on low-resource African languages.

A neural language identification toolkit that detects which of 517 African languages and varieties a text belongs to across 14 language families, reaching 97.41 macro-F1 after fine-tuning on SERENGETI. Developed by the UBC Deep Learning and NLP Lab and published at EMNLP 2022.

Full details: AfriBERTa Full details: AfroLID