NEURIPS 2025 "visual grounding" Papers
7 papers found
ChartSketcher: Reasoning with Multimodal Feedback and Reflection for Chart Understanding
Muye Huang, Lingling Zhang, Jie Ma et al.
NEURIPS 2025posterarXiv:2505.19076
5
citations
Dual-Stage Value-Guided Inference with Margin-Based Reward Adjustment for Fast and Faithful VLM Captioning
Ankan Deria, Adinath Dukre, feilong tang et al.
NEURIPS 2025oralarXiv:2506.15649
ESCA: Contextualizing Embodied Agents via Scene-Graph Generation
Jiani Huang, Amish Sethi, Matthew Kuo et al.
NEURIPS 2025oralarXiv:2510.15963
Grounding Language with Vision: A Conditional Mutual Information Calibrated Decoding Strategy for Reducing Hallucinations in LVLMs
Hao Fang, Changle Zhou, Jiawei Kong et al.
NEURIPS 2025posterarXiv:2505.19678
6
citations
MLLMs Need 3D-Aware Representation Supervision for Scene Understanding
Xiaohu Huang, Jingjing Wu, Qunyi Xie et al.
NEURIPS 2025posterarXiv:2506.01946
17
citations
Point-RFT: Improving Multimodal Reasoning with Visually Grounded Reinforcement Finetuning
Minheng Ni, Zhengyuan Yang, Linjie Li et al.
NEURIPS 2025posterarXiv:2505.19702
12
citations
Vision Function Layer in Multimodal LLMs
Cheng Shi, Yizhou Yu, Sibei Yang
NEURIPS 2025posterarXiv:2509.24791
4
citations