"multi-modal reasoning" Papers
11 papers found
Conference
Are VLMs Ready for Autonomous Driving? An Empirical Study from the Reliability, Data and Metric Perspectives
Shaoyuan Xie, Lingdong Kong, Yuhao Dong et al.
ICCV 2025arXiv:2501.04003
74
citations
AVTrustBench: Assessing and Enhancing Reliability and Robustness in Audio-Visual LLMs
Sanjoy Chowdhury, Sayan Nag, Subhrajyoti Dasgupta et al.
ICCV 2025arXiv:2501.02135
10
citations
ChatReID: Open-ended Interactive Person Retrieval via Hierarchical Progressive Tuning for Vision Language Models
Ke Niu, Haiyang Yu, Mengyang Zhao et al.
ICCV 2025arXiv:2502.19958
8
citations
CoMT: A Novel Benchmark for Chain of Multi-modal Thought on Large Vision-Language Models
Zihui Cheng, Qiguang Chen, Jin Zhang et al.
AAAI 2025paperarXiv:2412.12932
30
citations
DanmakuTPPBench: A Multi-modal Benchmark for Temporal Point Process Modeling and Understanding
Yue Jiang, Jichu Li, Yang Liu et al.
NEURIPS 2025oralarXiv:2505.18411
5
citations
EvolvedGRPO: Unlocking Reasoning in LVLMs via Progressive Instruction Evolution
Zhebei Shen, Qifan Yu, Juncheng Li et al.
NEURIPS 2025
InstructHOI: Context-Aware Instruction for Multi-Modal Reasoning in Human-Object Interaction Detection
Jinguo Luo, Weihong Ren, Quanlong Zheng et al.
NEURIPS 2025spotlight
Octopus: Alleviating Hallucination via Dynamic Contrastive Decoding
Wei Suo, Lijun Zhang, Mengyang Sun et al.
CVPR 2025highlightarXiv:2503.00361
16
citations
Visual-O1: Understanding Ambiguous Instructions via Multi-modal Multi-turn Chain-of-thoughts Reasoning
Minheng Ni, YuTao Fan, Lei Zhang et al.
ICLR 2025arXiv:2410.03321
20
citations
MathVerse: Does Your Multi-modal LLM Truly See the Diagrams in Visual Math Problems?
Renrui Zhang, Dongzhi Jiang, Yichi Zhang et al.
ECCV 2024arXiv:2403.14624
498
citations
Multi-Modal Latent Space Learning for Chain-of-Thought Reasoning in Language Models
Liqi He, Zuchao Li, Xiantao Cai et al.
AAAI 2024paperarXiv:2312.08762
36
citations