NeurIPS 2025 "visual grounding" Papers
4 papers found
Dual-Stage Value-Guided Inference with Margin-Based Reward Adjustment for Fast and Faithful VLM Captioning
Ankan Deria, Adinath Dukre, feilong tang et al.
NeurIPS 2025oralarXiv:2506.15649
ESCA: Contextualizing Embodied Agents via Scene-Graph Generation
Jiani Huang, Amish Sethi, Matthew Kuo et al.
NeurIPS 2025oralarXiv:2510.15963
Grounding Language with Vision: A Conditional Mutual Information Calibrated Decoding Strategy for Reducing Hallucinations in LVLMs
Hao Fang, Changle Zhou, Jiawei Kong et al.
NeurIPS 2025posterarXiv:2505.19678
6
citations
MLLMs Need 3D-Aware Representation Supervision for Scene Understanding
Xiaohu Huang, Jingjing Wu, Qunyi Xie et al.
NeurIPS 2025posterarXiv:2506.01946
17
citations