IdealGPT: Iteratively Decomposing Vision and Language Reasoning via Large Language Models (original) (raw)

VQA-LOL: Visual Question Answering under the Lens of Logic

Pratyay Banerjee

ArXiv, 2020

View PDFchevron_right

Question Decomposition Improves the Faithfulness of Model-Generated Reasoning

Esin Durmus

arXiv (Cornell University), 2023

View PDFchevron_right

Coarse-to-Fine Reasoning for Visual Question Answering

Quang Tran

2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)

View PDFchevron_right

CLIP-TD: CLIP Targeted Distillation for Vision-Language Tasks

Zhecan Wang

2022

View PDFchevron_right

UniFine: A Unified and Fine-grained Approach for Zero-shot Vision-Language Understanding

Zhecan Wang

Findings of the Association for Computational Linguistics: ACL 2023

View PDFchevron_right

LRTA: A Transparent Neural-Symbolic Reasoning Framework with Modular Supervision for Visual Question Answering

AISHWARYA N Reganti

arXiv (Cornell University), 2020

View PDFchevron_right

VQA and Visual Reasoning: An Overview of Recent Datasets, Methods and Challenges

Zaharaddeen Karami Lawal

arXiv (Cornell University), 2022

View PDFchevron_right

cViL: Cross-Lingual Training of Vision-Language Models using Knowledge Distillation

Devansh Gautam

Cornell University - arXiv, 2022

View PDFchevron_right

Why Does the VQA Model Answer No?: Improving Reasoning through Visual and Linguistic Inference

Rita Kusriastuti

2019

View PDFchevron_right

Harnessing the Power of Multi-Task Pretraining for Ground-Truth Level Natural Language Explanations

Stefan Wermter

arXiv (Cornell University), 2022

View PDFchevron_right

Enabling Calibration In The Zero-Shot Inference of Large Vision-Language Models

Pranav Raj

arXiv (Cornell University), 2023

View PDFchevron_right

Finetuned Language Models Are Zero-Shot Learners

Brian Lester

2021

View PDFchevron_right

REFINER: Reasoning Feedback on Intermediate Representations

Debjit Paul

arXiv (Cornell University), 2023

View PDFchevron_right

Multi-step Reasoning via Recurrent Dual Attention for Visual Dialog

ahmed kholy

Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, 2019

View PDFchevron_right

GQA: A New Dataset for Real-World Visual Reasoning and Compositional Question Answering

Christopher D Manning

2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019

View PDFchevron_right

Taking a HINT: Leveraging Explanations to Make Vision and Language Models More Grounded

Shalini Ghosh

2019 IEEE/CVF International Conference on Computer Vision (ICCV), 2019

View PDFchevron_right

FiLM: Visual Reasoning with a General Conditioning Layer

Andrea Santana

2017

View PDFchevron_right

CLEVR-Ref+: Diagnosing Visual Reasoning With Referring Expressions

Yutong Bai

2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019

View PDFchevron_right

Just because you are right, doesn't mean I am wrong': Overcoming a Bottleneck in the Development and Evaluation of Open-Ended Visual Question Answering (VQA) Tasks

Akarshan Sajja

2021

View PDFchevron_right

Visual Entailment: A Novel Task for Fine-Grained Image Understanding

Farley Lai

ArXiv, 2019

View PDFchevron_right

Scene Graph Reasoning for Visual Question Answering

Rajat Koner

ArXiv, 2020

View PDFchevron_right

Chat-UniVi: Unified Visual Representation Empowers Large Language Models with Image and Video Understanding

Peng Jin

View PDFchevron_right

Visual question answering with modules and language modeling

Vardaan Pahuja

2019

View PDFchevron_right

Lightweight Visual Question Answering using Scene Graphs

Ramraj Chandradevan

Proceedings of the 30th ACM International Conference on Information & Knowledge Management

View PDFchevron_right

Generating Training Data with Language Models: Towards Zero-Shot Language Understanding

Jiawei Han

arXiv (Cornell University), 2022

View PDFchevron_right

Attention over learned object embeddings enables complex visual reasoning

Malcolm Reynolds, David Ding

2020

View PDFchevron_right

Dual Recurrent Attention Units for Visual Question Answering

Ahmed Osman

2018

View PDFchevron_right

COVR: A Test-Bed for Visually Grounded Compositional Generalization with Real Images

Shivanshu Gupta

Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, 2021

View PDFchevron_right

Towards Zero Shot Commonsense Reasoning with Self Supervised Refinement of Language Models

Salsabil Rachmawati Syarif

arXiv: Computation and Language, 2021

View PDFchevron_right

IQ-VQA: Intelligent Visual Question Answering

Mohit Chandak

2020

View PDFchevron_right

Select, Substitute, Search: A New Benchmark for Knowledge-Augmented Visual Question Answering

aman jain

Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2021

View PDFchevron_right

Graphhopper: Multi-hop Scene Graph Reasoning for Visual Question Answering

Rajat Koner

The Semantic Web – ISWC 2021

View PDFchevron_right

Flamingo: a Visual Language Model for Few-Shot Learning

Malcolm Reynolds

arXiv (Cornell University), 2022

View PDFchevron_right

Ensemble of Streamlined Bilinear Visual Question Answering Models for the ImageCLEF2019 Challenge in the Medical Domain

Minh Hieu Vu

2019

View PDFchevron_right

KAT: A Knowledge Augmented Transformer for Vision-and-Language

Alex Hauptmann

Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

View PDFchevron_right

IdealGPT: Iteratively Decomposing Vision and Language Reasoning via Large Language Models (original) (raw)

Related papers