Datasets

Hausa Visual Genome (HausaVG) vs MasakhaNER 2.0

A verified, side-by-side comparison. Both records are status-checked by Findra, so you are comparing what each actually offers today, not a stale listing.

Category
Datasets
Datasets
Type
Language / NLP
Language / NLP
Country
πŸ‡³πŸ‡¬ Nigeria
🌍 Pan-African
Docs status
Docs live
Docs live
Licensing
Pricing
Free / CC-BY-NC-SA 4.0
Free / CC-BY-NC 4.0
Verified
Verified
Verified
Last verified
24 Jun 2026
24 Jun 2026
Tags
nlp, hausa, machine-translation, multimodal, image-captioning
nlp, ner, named-entity-recognition, african-languages, token-classification
Summary
Multimodal Hausa-English dataset of 32,923 images with paired English/Hausa region descriptions (train/dev/test/challenge splits), post-edited by HausaNLP and Bayero University Kano translators for English-to-Hausa machine translation and image description.
Largest high-quality named-entity-recognition corpus for 20 African languages (incl. Nigerian Pidgin, Hausa, Igbo, Yoruba) with PER/ORG/LOC/DATE tags over news-domain text, totaling ~152,786 rows. Built by the Masakhane community.