Developer Tools

amseg vs PuoBERTa

A verified, side-by-side comparison. Both records are status-checked by Findra, so you are comparing what each actually offers today, not a stale listing.

Category
Developer Tools
Developer Tools
Type
NLP Library
NLP Library
Country
🇪🇹 Ethiopia
🇿🇦 South Africa
Docs status
Docs live
Docs live
Licensing
Pricing
Free / open-source
Free / open-source
Verified
Verified
Verified
Last verified
24 Jun 2026
24 Jun 2026
Tags
nlp, python, amharic, ethiopic, tokenization
south-africa, python, setswana, roberta, language-model
Summary
amseg is an Amharic document segmentation and normalization tool that splits Ethiopic text into sentences and tokens, normalizes character variants and transliterates between Latin and Fidel. Maintained under the University of Hamburg Semantic Models for Amharic project.
PuoBERTa is a RoBERTa-based masked language model purpose-built for Setswana, trained on the PuoData corpus by the Data Science for Social Impact group. It ships with example scripts for fill-mask, news classification, NER and POS tagging via HuggingFace Transformers.