Poster "multimodal reasoning" Papers

15 papers found

Accelerating Multimodal Large Language Models via Dynamic Visual-Token Exit and the Empirical Findings

Qiong Wu, Wenhao Lin, Yiyi Zhou et al.

NeurIPS 2025posterarXiv:2411.19628
5
citations

Argus: Vision-Centric Reasoning with Grounded Chain-of-Thought

Yunze Man, De-An Huang, Guilin Liu et al.

CVPR 2025posterarXiv:2505.23766
19
citations

AVHBench: A Cross-Modal Hallucination Benchmark for Audio-Visual Large Language Models

Kim Sung-Bin, Oh Hyun-Bin, Lee Jung-Mok et al.

ICLR 2025posterarXiv:2410.18325
17
citations

Bifrost-1: Bridging Multimodal LLMs and Diffusion Models with Patch-level CLIP Latents

Han Lin, Jaemin Cho, Amir Zadeh et al.

NeurIPS 2025posterarXiv:2508.05954
6
citations

Critic-V: VLM Critics Help Catch VLM Errors in Multimodal Reasoning

Di Zhang, Jingdi Lei, Junxian Li et al.

CVPR 2025posterarXiv:2411.18203
30
citations

Do Vision-Language Models Represent Space and How? Evaluating Spatial Frame of Reference under Ambiguities

Zheyuan Zhang, Fengyuan Hu, Jayjun Lee et al.

ICLR 2025posterarXiv:2410.17385
40
citations

DreamPRM: Domain-reweighted Process Reward Model for Multimodal Reasoning

Qi Cao, Ruiyi Wang, Ruiyi Zhang et al.

NeurIPS 2025posterarXiv:2505.20241
5
citations

Fast-in-Slow: A Dual-System VLA Model Unifying Fast Manipulation within Slow Reasoning

Hao Chen, Jiaming Liu, Chenyang Gu et al.

NeurIPS 2025poster
27
citations

MMAT-1M: A Large Reasoning Dataset for Multimodal Agent Tuning

Tianhong Gao, Yannian Fu, Weiqun Wu et al.

ICCV 2025posterarXiv:2507.21924
1
citations

NL-Eye: Abductive NLI For Images

Mor Ventura, Michael Toker, Nitay Calderon et al.

ICLR 2025posterarXiv:2410.02613
3
citations

NoisyGRPO: Incentivizing Multimodal CoT Reasoning via Noise Injection and Bayesian Estimation

Longtian Qiu, Shan Ning, Jiaxuan Sun et al.

NeurIPS 2025posterarXiv:2510.21122

Towards Omnimodal Expressions and Reasoning in Referring Audio-Visual Segmentation

Kaining Ying, Henghui Ding, Guangquan Jie et al.

ICCV 2025posterarXiv:2507.22886
5
citations

OpenIns3D: Snap and Lookup for 3D Open-vocabulary Instance Segmentation

Zhening Huang, Xiaoyang Wu, Xi Chen et al.

ECCV 2024posterarXiv:2309.00616
82
citations

RoboCodeX: Multimodal Code Generation for Robotic Behavior Synthesis

Yao Mu, Junting Chen, Qing-Long Zhang et al.

ICML 2024poster

Vamos: Versatile Action Models for Video Understanding

Shijie Wang, Qi Zhao, Minh Quan et al.

ECCV 2024posterarXiv:2311.13627
36
citations