Hausa Visual Genome (HausaVG) vs Kencorpus Kenyan Language Corpus

A verified, side-by-side comparison. Both records are status-checked by Findra, so you are comparing what each actually offers today, not a stale listing.

Hausa Visual Genome (HausaVG)Kencorpus Kenyan Language Corpus

Tags

nlp, hausa, machine-translation, multimodal, image-captioning

nlp, corpus, swahili, dholuo, luhya

Links

Website Docs

Summary

Multimodal Hausa-English dataset of 32,923 images with paired English/Hausa region descriptions (train/dev/test/challenge splits), post-edited by HausaNLP and Bayero University Kano translators for English-to-Hausa machine translation and image description.

Text and speech corpus for three Kenyan languages, Swahili, Dholuo and Luhya, containing 4,442 texts (5.6 million words) and 1,152 speech files (177 hours). It also ships derived NLP sets: POS-tagged Dholuo/Luhya, 7,537 Swahili question-answer pairs and 13,400 translated sentences. Downloadable from Harvard Dataverse; released 2022.

Full details: Hausa Visual Genome (HausaVG)Full details: Kencorpus Kenyan Language Corpus