Tools registry
Developer Tools
SDKs, libraries, validators and utilities that solve problems specific to African markets.
9 results in NLP Library
Hausa-NLP
A community resource hub for Hausa NLP providing Hausa corpus, sentiment lexicons (incl. translated lexicons) and resources for sentiment analysis, hate-speech detection and machine translation. Maintained by the HausaNLP community.
HornMorpho
HornMorpho is a Python program that performs morphological analysis and generation of Amharic, Oromo and Tigrinya words, breaking words into constituent morphemes and generating words from roots and grammatical structure. It originated from the L3 Project at Indiana University.
PuoBERTa
PuoBERTa is a RoBERTa-based masked language model purpose-built for Setswana, trained on the PuoData corpus by the Data Science for Social Impact group. It ships with example scripts for fill-mask, news classification, NER and POS tagging via HuggingFace Transformers.
SOMALI_NLP
SOMALI_NLP is a Python NLP toolkit for the Somali language providing stop-word lists, stemmers for morphological analysis, tokenizers, collocation analysis and string-distance and spelling models. It draws on a companion Somali Wikipedia corpus.
amseg
amseg is an Amharic document segmentation and normalization tool that splits Ethiopic text into sentences and tokens, normalizes character variants and transliterates between Latin and Fidel. Maintained under the University of Hamburg Semantic Models for Amharic project.
etnltk
The Ethiopian Natural Language Toolkit, a spaCy/NLTK-inspired Python (PyPI etnltk) library for Amharic and other Ethiopian languages, providing text normalization, short-form expansion and word/sentence tokenization. Maintained by robeleq.
iranlowo
A Python (PyPI iranlowo) utility library to analyse and preprocess Yoruba text: diacritic stripping/restoration via pretrained models, text normalization, character verification and corpus tools. Maintained by the Niger-Volta-LTI organization.
stopwords-sw
A comprehensive Swahili (sw) stopwords collection distributed in JSON and text formats (npm/bower stopwords-sw) for text preprocessing in NLP pipelines. Maintained by the stopwords-iso project.
uroman
uroman is a universal romanizer that converts text in virtually any script to the Latin alphabet, with dedicated handling for Amharic and the Ge'ez/Ethiopic script. It also adds initial support for Coptic and processes script-native numerals.
