2025 "visual reasoning" Papers

16 papers found

Benchmarking Multimodal CoT Reward Model Stepwise by Visual Program

Minghe Gao, Xuqi Liu, Zhongqi Yue et al.

ICCV 2025posterarXiv:2504.06606
10
citations

CogCoM: A Visual Language Model with Chain-of-Manipulations Reasoning

Ji Qi, Ming Ding, Weihan Wang et al.

ICLR 2025posterarXiv:2402.04236
33
citations

Does Reinforcement Learning Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model?

Yang Yue, Zhiqi Chen, Rui Lu et al.

NeurIPS 2025oralarXiv:2504.13837
483
citations

DWIM: Towards Tool-aware Visual Reasoning via Discrepancy-aware Workflow Generation & Instruct-Masking Tuning

Fucai Ke, Vijay Kumar b g, Xingjian Leng et al.

ICCV 2025posterarXiv:2503.19263
6
citations

Hallucination at a Glance: Controlled Visual Edits and Fine-Grained Multimodal Learning

Tianyi Bai, Yuxuan Fan, Qiu Jiantao et al.

NeurIPS 2025posterarXiv:2506.07227
2
citations

Latent Chain-of-Thought for Visual Reasoning

Guohao Sun, Hang Hua, Jian Wang et al.

NeurIPS 2025posterarXiv:2510.23925
7
citations

Mind the GAP: Glimpse-based Active Perception improves generalization and sample efficiency of visual reasoning

Oleh Kolner, Thomas Ortner, Stanisław Woźniak et al.

ICLR 2025posterarXiv:2409.20213

Neurosymbolic Diffusion Models

Emile van Krieken, Pasquale Minervini, Edoardo Maria Ponti et al.

NeurIPS 2025posterarXiv:2505.13138
3
citations

Open Vision Reasoner: Transferring Linguistic Cognitive Behavior for Visual Reasoning

Yana Wei, Liang Zhao, Jianjian Sun et al.

NeurIPS 2025posterarXiv:2507.05255
14
citations

OpenVLThinker: Complex Vision-Language Reasoning via Iterative SFT-RL Cycles

Yihe Deng, Hritik Bansal, Fan Yin et al.

NeurIPS 2025posterarXiv:2503.17352
15
citations

UniPixel: Unified Object Referring and Segmentation for Pixel-Level Visual Reasoning

Ye Liu, Zongyang Ma, Junfu Pu et al.

NeurIPS 2025posterarXiv:2509.18094
4
citations

VideoAds for Fast-Paced Video Understanding

Zheyuan Zhang, Wanying Dou, Linkai Peng et al.

ICCV 2025posterarXiv:2504.09282
2
citations

VIKI‑R: Coordinating Embodied Multi-Agent Cooperation via Reinforcement Learning

Li Kang, Xiufeng Song, Heng Zhou et al.

NeurIPS 2025posterarXiv:2506.09049
8
citations

Visual Jenga: Discovering Object Dependencies via Counterfactual Inpainting

Anand Bhattad, Konpat Preechakul, Alexei Efros

NeurIPS 2025posterarXiv:2503.21770
8
citations

VisualQuality-R1: Reasoning-Induced Image Quality Assessment via Reinforcement Learning to Rank

Tianhe Wu, Jian Zou, Jie Liang et al.

NeurIPS 2025spotlightarXiv:2505.14460
30
citations

Visual Structures Help Visual Reasoning: Addressing the Binding Problem in LVLMs

Amirmohammad Izadi, Mohammadali Banayeeanzade, Fatemeh Askari et al.

NeurIPS 2025poster
1
citations