Visual Question Answering (original) (raw)

Home
People
Code
Demo
Download
- VQA v2
- VQA v1
Evaluation
Challenge
- 2021
- 2020
- 2019
- 2018
- 2017
- 2016
Browse
- VQA v2
- VQA v1
Visualize
Workshop
- 2021
- 2020
- 2019
- 2018
- 2017
- 2016
Sponsors
Terms
External

What is VQA?

VQA is a new dataset containing open-ended questions about images. These questions require an understanding of vision, language and commonsense knowledge to answer.

265,016 images (COCO and abstract scenes)
At least 3 questions (5.4 questions on average) per image
10 ground truth answers per question
3 plausible (but likely incorrect) answers per question
Automatic evaluation metric

Dataset

Details on downloading the latest dataset may be found on the download webpage.

April 2017: Full release (v2.0)

Balanced Real Images

204,721 COCO images
(all of current train/val/test)
1,105,904 questions
11,059,040 ground truth answers
⊕
March 2017: Beta v1.9 release

Balanced Real Images

123,287 COCO images
(only train/val)
658,111 questions
6,581,110 ground truth answers
1,974,333 plausible answers

Balanced Binary Abstract Scenes

31,325 abstract scenes
(only train/val)
33,383 questions
333,830 ground truth answers
⊕
October 2015: Full release (v1.0)

Real Images

204,721 COCO images
(all of current train/val/test)
614,163 questions
6,141,630 ground truth answers
1,842,489 plausible answers

Abstract Scenes

50,000 abstract scenes
150,000 questions
1,500,000 ground truth answers
450,000 plausible answers
250,000 captions
⊕
July 2015: Beta v0.9 release
123,287 COCO images (all of train/val)
369,861 questions
3,698,610 ground truth answers
1,109,583 plausible answers
⊕
June 2015: Beta v0.1 release
10,000 COCO images (from train)
30,000 questions
300,000 ground truth answers
90,000 plausible answers

Papers

Making the V in VQA Matter: Elevating the Role of Image Understanding in Visual Question Answering (CVPR 2017)

Download the paper

BibTeX

@InProceedings{balanced_vqa_v2,
author = {Yash Goyal and Tejas Khot and Douglas Summers{-}Stay and Dhruv Batra and Devi Parikh},
title = {Making the {V} in {VQA} Matter: Elevating the Role of Image Understanding in {V}isual {Q}uestion {A}nswering},
booktitle = {Conference on Computer Vision and Pattern Recognition (CVPR)},
year = {2017},
}

Yin and Yang: Balancing and Answering Binary Visual Questions (CVPR 2016)

Download the paper

BibTeX

@InProceedings{balanced_binary_vqa,
author = {Peng Zhang and Yash Goyal and Douglas Summers{-}Stay and Dhruv Batra and Devi Parikh},
title = {{Y}in and {Y}ang: Balancing and Answering Binary Visual Questions},
booktitle = {Conference on Computer Vision and Pattern Recognition (CVPR)},
year = {2016},
}

VQA: Visual Question Answering (ICCV 2015)

Download the paper

BibTeX

@InProceedings{{VQA},
author = {Stanislaw Antol and Aishwarya Agrawal and Jiasen Lu and Margaret Mitchell and Dhruv Batra and C. Lawrence Zitnick and Devi Parikh},
title = {{VQA}: {V}isual {Q}uestion {A}nswering},
booktitle = {International Conference on Computer Vision (ICCV)},
year = {2015},
}

Visual Question Answering (original) (raw)

What is VQA?

Dataset

Papers

Videos

Feedback