IdealGPT: Iteratively Decomposing Vision and Language Reasoning via Large Language Models (original) (raw)
Related papers
VQA-LOL: Visual Question Answering under the Lens of Logic
ArXiv, 2020
Question Decomposition Improves the Faithfulness of Model-Generated Reasoning
arXiv (Cornell University), 2023
Coarse-to-Fine Reasoning for Visual Question Answering
2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)
CLIP-TD: CLIP Targeted Distillation for Vision-Language Tasks
2022
UniFine: A Unified and Fine-grained Approach for Zero-shot Vision-Language Understanding
Findings of the Association for Computational Linguistics: ACL 2023
arXiv (Cornell University), 2020
VQA and Visual Reasoning: An Overview of Recent Datasets, Methods and Challenges
arXiv (Cornell University), 2022
cViL: Cross-Lingual Training of Vision-Language Models using Knowledge Distillation
Cornell University - arXiv, 2022
Why Does the VQA Model Answer No?: Improving Reasoning through Visual and Linguistic Inference
2019
Harnessing the Power of Multi-Task Pretraining for Ground-Truth Level Natural Language Explanations
arXiv (Cornell University), 2022
Enabling Calibration In The Zero-Shot Inference of Large Vision-Language Models
arXiv (Cornell University), 2023
Finetuned Language Models Are Zero-Shot Learners
2021
REFINER: Reasoning Feedback on Intermediate Representations
arXiv (Cornell University), 2023
Multi-step Reasoning via Recurrent Dual Attention for Visual Dialog
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, 2019
GQA: A New Dataset for Real-World Visual Reasoning and Compositional Question Answering
2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019
Taking a HINT: Leveraging Explanations to Make Vision and Language Models More Grounded
2019 IEEE/CVF International Conference on Computer Vision (ICCV), 2019
FiLM: Visual Reasoning with a General Conditioning Layer
2017
CLEVR-Ref+: Diagnosing Visual Reasoning With Referring Expressions
2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019
2021
Visual Entailment: A Novel Task for Fine-Grained Image Understanding
ArXiv, 2019
Scene Graph Reasoning for Visual Question Answering
ArXiv, 2020
Visual question answering with modules and language modeling
2019
Lightweight Visual Question Answering using Scene Graphs
Proceedings of the 30th ACM International Conference on Information & Knowledge Management
Generating Training Data with Language Models: Towards Zero-Shot Language Understanding
arXiv (Cornell University), 2022
Attention over learned object embeddings enables complex visual reasoning
2020
Dual Recurrent Attention Units for Visual Question Answering
2018
COVR: A Test-Bed for Visually Grounded Compositional Generalization with Real Images
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, 2021
Towards Zero Shot Commonsense Reasoning with Self Supervised Refinement of Language Models
arXiv: Computation and Language, 2021
IQ-VQA: Intelligent Visual Question Answering
2020
Select, Substitute, Search: A New Benchmark for Knowledge-Augmented Visual Question Answering
Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2021
Graphhopper: Multi-hop Scene Graph Reasoning for Visual Question Answering
The Semantic Web – ISWC 2021
Flamingo: a Visual Language Model for Few-Shot Learning
arXiv (Cornell University), 2022
2019
KAT: A Knowledge Augmented Transformer for Vision-and-Language
Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies