2024 Poster "visual question answering" Papers

19 papers found

Filters:2024 poster visual question answering Clear all

Conference

AAAI 2025 (3,028)CVPR 2025 (2,873)ICCV 2025 (2,701)ICLR 2025 (3,827)ICML 2025 (3,340)ISMAR 2025 (229)NEURIPS 2025 (5,858)AAAI 2024 (2,289)CVPR 2024 (2,716)ECCV 2024 (2,387)ICLR 2024 (2,297)ICML 2024 (2,635)

Paper Type

poster (24,624)paper (8,140)oral (1,594)spotlight (1,421)highlight (975)

Boosting the Power of Small Multimodal Reasoning Models to Match Larger Models with Self-Consistency Training

Cheng Tan, Jingxuan Wei, Zhangyang Gao et al.

ECCV 2024posterarXiv:2311.14109

citations

Compositional Substitutivity of Visual Reasoning for Visual Question Answering

Chuanhao Li, Zhen Li, Chenchen Jing et al.

ECCV 2024poster

citations

CrossGET: Cross-Guided Ensemble of Tokens for Accelerating Vision-Language Transformers

Dachuan Shi, Chaofan Tao, Anyi Rao et al.

ICML 2024posterarXiv:2305.17455

Diffusion-Refined VQA Annotations for Semi-Supervised Gaze Following

Qiaomu Miao, Alexandros Graikos, Jingwei Zhang et al.

ECCV 2024posterarXiv:2406.02774

citations

Extracting Training Data From Document-Based VQA Models

Francesco Pinto, Nathalie Rauschmayr, Florian Tramer et al.

ICML 2024posterarXiv:2407.08707

GRACE: Graph-Based Contextual Debiasing for Fair Visual Question Answering

Yifeng Zhang, Ming Jiang, Qi Zhao

ECCV 2024poster

Improving Context Understanding in Multimodal Large Language Models via Multimodal Composition Learning

Wei Li, Hehe Fan, Yongkang Wong et al.

ICML 2024poster

MathVerse: Does Your Multi-modal LLM Truly See the Diagrams in Visual Math Problems?

Renrui Zhang, Dongzhi Jiang, Yichi Zhang et al.

ECCV 2024posterarXiv:2403.14624

473

citations

MMT-Bench: A Comprehensive Multimodal Benchmark for Evaluating Large Vision-Language Models Towards Multitask AGI

Kaining Ying, Fanqing Meng, Jin Wang et al.

ICML 2024posterarXiv:2404.16006

Model Tailor: Mitigating Catastrophic Forgetting in Multi-modal Large Language Models

Didi Zhu, Zhongyi Sun, Zexi Li et al.

ICML 2024posterarXiv:2402.12048

PIVOT: Iterative Visual Prompting Elicits Actionable Knowledge for VLMs

Soroush Nasiriany, Fei Xia, Wenhao Yu et al.

ICML 2024posterarXiv:2402.07872

Prismatic VLMs: Investigating the Design Space of Visually-Conditioned Language Models

Siddharth Karamcheti, Suraj Nair, Ashwin Balakrishna et al.

ICML 2024posterarXiv:2402.07865

Recursive Visual Programming

Jiaxin Ge, Sanjay Subramanian, Baifeng Shi et al.

ECCV 2024posterarXiv:2312.02249

citations

SyCoCa: Symmetrizing Contrastive Captioners with Attentive Masking for Multimodal Alignment

Ziping Ma, Furong Xu, Jian liu et al.

ICML 2024posterarXiv:2401.02137

Take A Step Back: Rethinking the Two Stages in Visual Reasoning

Mingyu Zhang, Jiting Cai, Mingyu Liu et al.

ECCV 2024posterarXiv:2407.19666

citations

TrojVLM: Backdoor Attack Against Vision Language Models

Weimin Lyu, Lu Pang, Tengfei Ma et al.

ECCV 2024posterarXiv:2409.19232

citations

View Selection for 3D Captioning via Diffusion Ranking

Tiange Luo, Justin Johnson, Honglak Lee

ECCV 2024posterarXiv:2404.07984

citations

VQA-Diff: Exploiting VQA and Diffusion for Zero-Shot Image-to-3D Vehicle Asset Generation in Autonomous Driving

Yibo Liu, Zheyuan Yang, Guile Wu et al.

ECCV 2024posterarXiv:2407.06516

citations

WSI-VQA: Interpreting Whole Slide Images by Generative Visual Question Answering

Pingyi Chen, Chenglu Zhu, Sunyi Zheng et al.

ECCV 2024posterarXiv:2407.05603

citations