Visual Question Answering (original) (raw)

logo

What is VQA?

VQA is a new dataset containing open-ended questions about images. These questions require an understanding of vision, language and commonsense knowledge to answer.

Dataset

Details on downloading the latest dataset may be found on the download webpage.

Balanced Real Images

Balanced Real Images

Balanced Binary Abstract Scenes

Real Images

Abstract Scenes


Papers

Making the V in VQA Matter: Elevating the Role of Image Understanding in Visual Question Answering (CVPR 2017)

Download the paper

BibTeX

@InProceedings{balanced_vqa_v2,
author = {Yash Goyal and Tejas Khot and Douglas Summers{-}Stay and Dhruv Batra and Devi Parikh},
title = {Making the {V} in {VQA} Matter: Elevating the Role of Image Understanding in {V}isual {Q}uestion {A}nswering},
booktitle = {Conference on Computer Vision and Pattern Recognition (CVPR)},
year = {2017},
}

Yin and Yang: Balancing and Answering Binary Visual Questions (CVPR 2016)

Download the paper

BibTeX

@InProceedings{balanced_binary_vqa,
author = {Peng Zhang and Yash Goyal and Douglas Summers{-}Stay and Dhruv Batra and Devi Parikh},
title = {{Y}in and {Y}ang: Balancing and Answering Binary Visual Questions},
booktitle = {Conference on Computer Vision and Pattern Recognition (CVPR)},
year = {2016},
}

VQA: Visual Question Answering (ICCV 2015)

Download the paper

BibTeX

@InProceedings{{VQA},
author = {Stanislaw Antol and Aishwarya Agrawal and Jiasen Lu and Margaret Mitchell and Dhruv Batra and C. Lawrence Zitnick and Devi Parikh},
title = {{VQA}: {V}isual {Q}uestion {A}nswering},
booktitle = {International Conference on Computer Vision (ICCV)},
year = {2015},
}


Videos


Feedback