PuoBERTa
VerifiedPuoBERTa is a RoBERTa-based masked language model purpose-built for Setswana, trained on the PuoData corpus by the Data Science for Social Impact group. It ships with example scripts for fill-mask, news classification, NER and POS tagging via HuggingFace Transformers.
- Category
- Developer Tools
- Pricing
- Free / open-source
- Country
- 🇿🇦 South Africa
- Last verified
- 24 Jun 2026
Tags
Compare PuoBERTa
Side-by-side, verified specs against its closest nlp library alternatives.
Related in Developer Tools
etnltk
The Ethiopian Natural Language Toolkit, a spaCy/NLTK-inspired Python (PyPI etnltk) library for Amharic and other Ethiopian languages, providing text normalization, short-form expansion and word/sentence tokenization. Maintained by robeleq.
iranlowo
A Python (PyPI iranlowo) utility library to analyse and preprocess Yoruba text: diacritic stripping/restoration via pretrained models, text normalization, character verification and corpus tools. Maintained by the Niger-Volta-LTI organization.
Hausa-NLP
A community resource hub for Hausa NLP providing Hausa corpus, sentiment lexicons (incl. translated lexicons) and resources for sentiment analysis, hate-speech detection and machine translation. Maintained by the HausaNLP community.
