AfriHate: A Multilingual Collection of Hate Speech and Abusive Language Datasets for African Languages vs IrokoBench: A New Benchmark for African Languages in the Age of Large Language Models

A verified, side-by-side comparison. Both records are status-checked by Findra, so you are comparing what each actually offers today, not a stale listing.

AfriHate: A Multilingual Collection of Hate Speech and Abusive Language Datasets for African Languages IrokoBench: A New Benchmark for African Languages in the Age of Large Language Models

Tags

african-languages, hate-speech, nlp-benchmark, content-moderation

african-languages, masakhane, nlp-benchmark, llm-evaluation

Links

Website

Summary

AfriHate is a multilingual benchmark of hate speech and abusive language datasets covering 15 African languages, annotated by native speakers. The paper contributes classification baselines and hate speech and offensive language lexicons, and analyses why keyword-based moderation fails for low-resource African languages. It was released on arXiv in January 2025.

IrokoBench is a human-translated evaluation benchmark covering 17 typologically diverse low-resource African languages across three tasks: natural language inference (AfriXNLI), mathematical reasoning (AfriMGSM) and knowledge-based multiple-choice QA (AfriMMLU). The paper evaluates open and proprietary LLMs and documents a large gap between high-resource languages and African languages, with the best open model reaching about 63 percent of GPT-4o performance. It was published at NAACL 2025.

Full details: AfriHate: A Multilingual Collection of Hate Speech and Abusive Language Datasets for African Languages Full details: IrokoBench: A New Benchmark for African Languages in the Age of Large Language Models