"visual reasoning" Papers
22 papers found
Benchmarking Multimodal CoT Reward Model Stepwise by Visual Program
Minghe Gao, Xuqi Liu, Zhongqi Yue et al.
CogCoM: A Visual Language Model with Chain-of-Manipulations Reasoning
Ji Qi, Ming Ding, Weihan Wang et al.
Does Reinforcement Learning Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model?
Yang Yue, Zhiqi Chen, Rui Lu et al.
DWIM: Towards Tool-aware Visual Reasoning via Discrepancy-aware Workflow Generation & Instruct-Masking Tuning
Fucai Ke, Vijay Kumar b g, Xingjian Leng et al.
Hallucination at a Glance: Controlled Visual Edits and Fine-Grained Multimodal Learning
Tianyi Bai, Yuxuan Fan, Qiu Jiantao et al.
Latent Chain-of-Thought for Visual Reasoning
Guohao Sun, Hang Hua, Jian Wang et al.
Mind the GAP: Glimpse-based Active Perception improves generalization and sample efficiency of visual reasoning
Oleh Kolner, Thomas Ortner, Stanisław Woźniak et al.
Neurosymbolic Diffusion Models
Emile van Krieken, Pasquale Minervini, Edoardo Maria Ponti et al.
Open Vision Reasoner: Transferring Linguistic Cognitive Behavior for Visual Reasoning
Yana Wei, Liang Zhao, Jianjian Sun et al.
OpenVLThinker: Complex Vision-Language Reasoning via Iterative SFT-RL Cycles
Yihe Deng, Hritik Bansal, Fan Yin et al.
UniPixel: Unified Object Referring and Segmentation for Pixel-Level Visual Reasoning
Ye Liu, Zongyang Ma, Junfu Pu et al.
VideoAds for Fast-Paced Video Understanding
Zheyuan Zhang, Wanying Dou, Linkai Peng et al.
VIKI‑R: Coordinating Embodied Multi-Agent Cooperation via Reinforcement Learning
Li Kang, Xiufeng Song, Heng Zhou et al.
Visual Jenga: Discovering Object Dependencies via Counterfactual Inpainting
Anand Bhattad, Konpat Preechakul, Alexei Efros
VisualQuality-R1: Reasoning-Induced Image Quality Assessment via Reinforcement Learning to Rank
Tianhe Wu, Jian Zou, Jie Liang et al.
Visual Structures Help Visual Reasoning: Addressing the Binding Problem in LVLMs
Amirmohammad Izadi, Mohammadali Banayeeanzade, Fatemeh Askari et al.
BlenderAlchemy: Editing 3D Graphics with Vision-Language Models
Ian Huang, Guandao Yang, Leonidas Guibas
Compositional Substitutivity of Visual Reasoning for Visual Question Answering
Chuanhao Li, Zhen Li, Chenchen Jing et al.
ConTextual: Evaluating Context-Sensitive Text-Rich Visual Reasoning in Large Multimodal Models
Rohan Wadhawan, Hritik Bansal, Kai-Wei Chang et al.
MMT-Bench: A Comprehensive Multimodal Benchmark for Evaluating Large Vision-Language Models Towards Multitask AGI
Kaining Ying, Fanqing Meng, Jin Wang et al.
Take A Step Back: Rethinking the Two Stages in Visual Reasoning
Mingyu Zhang, Jiting Cai, Mingyu Liu et al.
VisionGraph: Leveraging Large Multimodal Models for Graph Theory Problems in Visual Context
yunxin li, Baotian Hu, Haoyuan Shi et al.