AfriHate Hate Speech Datasets vs MasakhaPOS

A verified, side-by-side comparison. Both records are status-checked by Findra, so you are comparing what each actually offers today, not a stale listing.

AfriHate Hate Speech Datasets MasakhaPOS

Tags

nlp, african-languages, hate-speech, abusive-language, twitter

nlp, african-languages, token-classification, pos-tagging, ud-tags

Links

Website Docs

Website Docs GitHub

Summary

Multilingual collection of hate speech and abusive language datasets covering 15 African languages, built from tweets annotated by native speakers. Each instance carries labels from 3 to 4 annotators with anonymous annotator IDs, downloadable on HuggingFace. Published at NAACL 2025.

Part-of-speech tagging dataset for 20 African languages (incl. Nigerian Pidgin, Hausa, Igbo, Yoruba) using Universal Dependencies tags, with per-language train/validation/test splits. Built by the Masakhane community.

Full details: AfriHate Hate Speech Datasets Full details: MasakhaPOS