2024 Poster "visual question answering" Papers

19 papers found

Boosting the Power of Small Multimodal Reasoning Models to Match Larger Models with Self-Consistency Training

Cheng Tan, Jingxuan Wei, Zhangyang Gao et al.

ECCV 2024posterarXiv:2311.14109
29
citations

Compositional Substitutivity of Visual Reasoning for Visual Question Answering

Chuanhao Li, Zhen Li, Chenchen Jing et al.

ECCV 2024poster
1
citations

CrossGET: Cross-Guided Ensemble of Tokens for Accelerating Vision-Language Transformers

Dachuan Shi, Chaofan Tao, Anyi Rao et al.

ICML 2024posterarXiv:2305.17455

Diffusion-Refined VQA Annotations for Semi-Supervised Gaze Following

Qiaomu Miao, Alexandros Graikos, Jingwei Zhang et al.

ECCV 2024posterarXiv:2406.02774
1
citations

Extracting Training Data From Document-Based VQA Models

Francesco Pinto, Nathalie Rauschmayr, Florian Tramer et al.

ICML 2024posterarXiv:2407.08707

GRACE: Graph-Based Contextual Debiasing for Fair Visual Question Answering

Yifeng Zhang, Ming Jiang, Qi Zhao

ECCV 2024poster

Improving Context Understanding in Multimodal Large Language Models via Multimodal Composition Learning

Wei Li, Hehe Fan, Yongkang Wong et al.

ICML 2024poster

MathVerse: Does Your Multi-modal LLM Truly See the Diagrams in Visual Math Problems?

Renrui Zhang, Dongzhi Jiang, Yichi Zhang et al.

ECCV 2024posterarXiv:2403.14624
473
citations

MMT-Bench: A Comprehensive Multimodal Benchmark for Evaluating Large Vision-Language Models Towards Multitask AGI

Kaining Ying, Fanqing Meng, Jin Wang et al.

ICML 2024posterarXiv:2404.16006

Model Tailor: Mitigating Catastrophic Forgetting in Multi-modal Large Language Models

Didi Zhu, Zhongyi Sun, Zexi Li et al.

ICML 2024posterarXiv:2402.12048

PIVOT: Iterative Visual Prompting Elicits Actionable Knowledge for VLMs

Soroush Nasiriany, Fei Xia, Wenhao Yu et al.

ICML 2024posterarXiv:2402.07872

Prismatic VLMs: Investigating the Design Space of Visually-Conditioned Language Models

Siddharth Karamcheti, Suraj Nair, Ashwin Balakrishna et al.

ICML 2024posterarXiv:2402.07865

Recursive Visual Programming

Jiaxin Ge, Sanjay Subramanian, Baifeng Shi et al.

ECCV 2024posterarXiv:2312.02249
10
citations

SyCoCa: Symmetrizing Contrastive Captioners with Attentive Masking for Multimodal Alignment

Ziping Ma, Furong Xu, Jian liu et al.

ICML 2024posterarXiv:2401.02137

Take A Step Back: Rethinking the Two Stages in Visual Reasoning

Mingyu Zhang, Jiting Cai, Mingyu Liu et al.

ECCV 2024posterarXiv:2407.19666
9
citations

TrojVLM: Backdoor Attack Against Vision Language Models

Weimin Lyu, Lu Pang, Tengfei Ma et al.

ECCV 2024posterarXiv:2409.19232
23
citations

View Selection for 3D Captioning via Diffusion Ranking

Tiange Luo, Justin Johnson, Honglak Lee

ECCV 2024posterarXiv:2404.07984
29
citations

VQA-Diff: Exploiting VQA and Diffusion for Zero-Shot Image-to-3D Vehicle Asset Generation in Autonomous Driving

Yibo Liu, Zheyuan Yang, Guile Wu et al.

ECCV 2024posterarXiv:2407.06516
10
citations

WSI-VQA: Interpreting Whole Slide Images by Generative Visual Question Answering

Pingyi Chen, Chenglu Zhu, Sunyi Zheng et al.

ECCV 2024posterarXiv:2407.05603
24
citations