"multimodal reasoning" Papers

12 papers found

Accelerating Multimodal Large Language Models via Dynamic Visual-Token Exit and the Empirical Findings

Qiong Wu, Wenhao Lin, Yiyi Zhou et al.

NeurIPS 2025posterarXiv:2411.19628
5
citations

Argus: Vision-Centric Reasoning with Grounded Chain-of-Thought

Yunze Man, De-An Huang, Guilin Liu et al.

CVPR 2025posterarXiv:2505.23766
19
citations

AVHBench: A Cross-Modal Hallucination Benchmark for Audio-Visual Large Language Models

Kim Sung-Bin, Oh Hyun-Bin, Lee Jung-Mok et al.

ICLR 2025posterarXiv:2410.18325
17
citations

Critic-V: VLM Critics Help Catch VLM Errors in Multimodal Reasoning

Di Zhang, Jingdi Lei, Junxian Li et al.

CVPR 2025posterarXiv:2411.18203
30
citations

Do Vision-Language Models Represent Space and How? Evaluating Spatial Frame of Reference under Ambiguities

Zheyuan Zhang, Fengyuan Hu, Jayjun Lee et al.

ICLR 2025posterarXiv:2410.17385
40
citations

NL-Eye: Abductive NLI For Images

Mor Ventura, Michael Toker, Nitay Calderon et al.

ICLR 2025posterarXiv:2410.02613
3
citations

Temporal Reasoning Transfer from Text to Video

Lei Li, Yuanxin Liu, Linli Yao et al.

ICLR 2025oralarXiv:2410.06166
20
citations

Towards Omnimodal Expressions and Reasoning in Referring Audio-Visual Segmentation

Kaining Ying, Henghui Ding, Guangquan Jie et al.

ICCV 2025posterarXiv:2507.22886
5
citations

Image Content Generation with Causal Reasoning

Xiaochuan Li, Baoyu Fan, Run Zhang et al.

AAAI 2024paperarXiv:2312.07132

KAM-CoT: Knowledge Augmented Multimodal Chain-of-Thoughts Reasoning

Debjyoti Mondal, Suraj Modi, Subhadarshi Panda et al.

AAAI 2024paperarXiv:2401.12863
78
citations

OpenIns3D: Snap and Lookup for 3D Open-vocabulary Instance Segmentation

Zhening Huang, Xiaoyang Wu, Xi Chen et al.

ECCV 2024posterarXiv:2309.00616
82
citations

RoboCodeX: Multimodal Code Generation for Robotic Behavior Synthesis

Yao Mu, Junting Chen, Qing-Long Zhang et al.

ICML 2024poster