"multimodal reasoning" Papers
12 papers found
Accelerating Multimodal Large Language Models via Dynamic Visual-Token Exit and the Empirical Findings
Qiong Wu, Wenhao Lin, Yiyi Zhou et al.
NeurIPS 2025posterarXiv:2411.19628
5
citations
Argus: Vision-Centric Reasoning with Grounded Chain-of-Thought
Yunze Man, De-An Huang, Guilin Liu et al.
CVPR 2025posterarXiv:2505.23766
19
citations
AVHBench: A Cross-Modal Hallucination Benchmark for Audio-Visual Large Language Models
Kim Sung-Bin, Oh Hyun-Bin, Lee Jung-Mok et al.
ICLR 2025posterarXiv:2410.18325
17
citations
Critic-V: VLM Critics Help Catch VLM Errors in Multimodal Reasoning
Di Zhang, Jingdi Lei, Junxian Li et al.
CVPR 2025posterarXiv:2411.18203
30
citations
Do Vision-Language Models Represent Space and How? Evaluating Spatial Frame of Reference under Ambiguities
Zheyuan Zhang, Fengyuan Hu, Jayjun Lee et al.
ICLR 2025posterarXiv:2410.17385
40
citations
NL-Eye: Abductive NLI For Images
Mor Ventura, Michael Toker, Nitay Calderon et al.
ICLR 2025posterarXiv:2410.02613
3
citations
Temporal Reasoning Transfer from Text to Video
Lei Li, Yuanxin Liu, Linli Yao et al.
ICLR 2025oralarXiv:2410.06166
20
citations
Towards Omnimodal Expressions and Reasoning in Referring Audio-Visual Segmentation
Kaining Ying, Henghui Ding, Guangquan Jie et al.
ICCV 2025posterarXiv:2507.22886
5
citations
Image Content Generation with Causal Reasoning
Xiaochuan Li, Baoyu Fan, Run Zhang et al.
AAAI 2024paperarXiv:2312.07132
KAM-CoT: Knowledge Augmented Multimodal Chain-of-Thoughts Reasoning
Debjyoti Mondal, Suraj Modi, Subhadarshi Panda et al.
AAAI 2024paperarXiv:2401.12863
78
citations
OpenIns3D: Snap and Lookup for 3D Open-vocabulary Instance Segmentation
Zhening Huang, Xiaoyang Wu, Xi Chen et al.
ECCV 2024posterarXiv:2309.00616
82
citations
RoboCodeX: Multimodal Code Generation for Robotic Behavior Synthesis
Yao Mu, Junting Chen, Qing-Long Zhang et al.
ICML 2024poster