AfriHate Hate Speech Datasets vs MasakhaNER 2.0

A verified, side-by-side comparison. Both records are status-checked by Findra, so you are comparing what each actually offers today, not a stale listing.

AfriHate Hate Speech Datasets MasakhaNER 2.0

Tags

nlp, african-languages, hate-speech, abusive-language, twitter

nlp, ner, named-entity-recognition, african-languages, token-classification

Links

Website Docs

Website Docs GitHub

Summary

Multilingual collection of hate speech and abusive language datasets covering 15 African languages, built from tweets annotated by native speakers. Each instance carries labels from 3 to 4 annotators with anonymous annotator IDs, downloadable on HuggingFace. Published at NAACL 2025.

Largest high-quality named-entity-recognition corpus for 20 African languages (incl. Nigerian Pidgin, Hausa, Igbo, Yoruba) with PER/ORG/LOC/DATE tags over news-domain text, totaling ~152,786 rows. Built by the Masakhane community.

Full details: AfriHate Hate Speech Datasets Full details: MasakhaNER 2.0