"multi-modal reasoning" Papers

11 papers found

Are VLMs Ready for Autonomous Driving? An Empirical Study from the Reliability, Data and Metric Perspectives

Shaoyuan Xie, Lingdong Kong, Yuhao Dong et al.

ICCV 2025arXiv:2501.04003
74
citations

AVTrustBench: Assessing and Enhancing Reliability and Robustness in Audio-Visual LLMs

Sanjoy Chowdhury, Sayan Nag, Subhrajyoti Dasgupta et al.

ICCV 2025arXiv:2501.02135
10
citations

ChatReID: Open-ended Interactive Person Retrieval via Hierarchical Progressive Tuning for Vision Language Models

Ke Niu, Haiyang Yu, Mengyang Zhao et al.

ICCV 2025arXiv:2502.19958
8
citations

CoMT: A Novel Benchmark for Chain of Multi-modal Thought on Large Vision-Language Models

Zihui Cheng, Qiguang Chen, Jin Zhang et al.

AAAI 2025paperarXiv:2412.12932
30
citations

DanmakuTPPBench: A Multi-modal Benchmark for Temporal Point Process Modeling and Understanding

Yue Jiang, Jichu Li, Yang Liu et al.

NEURIPS 2025oralarXiv:2505.18411
5
citations

EvolvedGRPO: Unlocking Reasoning in LVLMs via Progressive Instruction Evolution

Zhebei Shen, Qifan Yu, Juncheng Li et al.

NEURIPS 2025

InstructHOI: Context-Aware Instruction for Multi-Modal Reasoning in Human-Object Interaction Detection

Jinguo Luo, Weihong Ren, Quanlong Zheng et al.

NEURIPS 2025spotlight

Octopus: Alleviating Hallucination via Dynamic Contrastive Decoding

Wei Suo, Lijun Zhang, Mengyang Sun et al.

CVPR 2025highlightarXiv:2503.00361
16
citations

Visual-O1: Understanding Ambiguous Instructions via Multi-modal Multi-turn Chain-of-thoughts Reasoning

Minheng Ni, YuTao Fan, Lei Zhang et al.

ICLR 2025arXiv:2410.03321
20
citations

MathVerse: Does Your Multi-modal LLM Truly See the Diagrams in Visual Math Problems?

Renrui Zhang, Dongzhi Jiang, Yichi Zhang et al.

ECCV 2024arXiv:2403.14624
498
citations

Multi-Modal Latent Space Learning for Chain-of-Thought Reasoning in Language Models

Liqi He, Zuchao Li, Xiantao Cai et al.

AAAI 2024paperarXiv:2312.08762
36
citations