"visual grounding" Papers

23 papers found

Acknowledging Focus Ambiguity in Visual Questions

Chongyan Chen, Yu-Yun Tseng, Zhuoheng Li et al.

ICCV 2025posterarXiv:2501.02201
2
citations

Controlling Multimodal LLMs via Reward-guided Decoding

Oscar Mañas, Pierluca D'Oro, Koustuv Sinha et al.

ICCV 2025posterarXiv:2508.11616

COUNTS: Benchmarking Object Detectors and Multimodal Large Language Models under Distribution Shifts

Jiansheng Li, Xingxuan Zhang, Hao Zou et al.

CVPR 2025highlightarXiv:2504.10158
1
citations

DAMO: Decoding by Accumulating Activations Momentum for Mitigating Hallucinations in Vision-Language Models

Kaishen Wang, Hengrui Gu, Meijun Gao et al.

ICLR 2025poster
7
citations

ESCA: Contextualizing Embodied Agents via Scene-Graph Generation

Jiani Huang, Amish Sethi, Matthew Kuo et al.

NeurIPS 2025oralarXiv:2510.15963

F-LMM: Grounding Frozen Large Multimodal Models

Size Wu, Sheng Jin, Wenwei Zhang et al.

CVPR 2025posterarXiv:2406.05821
21
citations

Grounding Language with Vision: A Conditional Mutual Information Calibrated Decoding Strategy for Reducing Hallucinations in LVLMs

Hao Fang, Changle Zhou, Jiawei Kong et al.

NeurIPS 2025posterarXiv:2505.19678
6
citations

MLLMs Need 3D-Aware Representation Supervision for Scene Understanding

Xiaohu Huang, Jingjing Wu, Qunyi Xie et al.

NeurIPS 2025posterarXiv:2506.01946
17
citations

Move to Understand a 3D Scene: Bridging Visual Grounding and Exploration for Efficient and Versatile Embodied Navigation

ZIYU ZHU, Xilin Wang, Yixuan Li et al.

ICCV 2025highlightarXiv:2507.04047
24
citations

PerturboLLaVA: Reducing Multimodal Hallucinations with Perturbative Visual Training

Cong Chen, Mingyu Liu, Chenchen Jing et al.

ICLR 2025posterarXiv:2503.06486
25
citations

STING-BEE: Towards Vision-Language Model for Real-World X-ray Baggage Security Inspection

Divya Velayudhan, Abdelfatah Ahmed, Mohamad Alansari et al.

CVPR 2025highlightarXiv:2504.02823
2
citations

Your Large Vision-Language Model Only Needs A Few Attention Heads For Visual Grounding

seil kang, Jinyeong Kim, Junhyeok Kim et al.

CVPR 2025highlightarXiv:2503.06287
31
citations

Cycle-Consistency Learning for Captioning and Grounding

Ning Wang, Jiajun Deng, Mingbo Jia

AAAI 2024paperarXiv:2312.15162
13
citations

DetToolChain: A New Prompting Paradigm to Unleash Detection Ability of MLLM

Yixuan Wu, Yizhou Wang, Shixiang Tang et al.

ECCV 2024posterarXiv:2403.12488
47
citations

Exploring Phrase-Level Grounding with Text-to-Image Diffusion Model

Danni Yang, Ruohan Dong, Jiayi Ji et al.

ECCV 2024posterarXiv:2407.05352
9
citations

GroundVLP: Harnessing Zero-Shot Visual Grounding from Vision-Language Pre-training and Open-Vocabulary Object Detection

Haozhan Shen, Tiancheng Zhao, Mingwei Zhu et al.

AAAI 2024paperarXiv:2312.15043

LLaVA-Grounding: Grounded Visual Chat with Large Multimodal Models

Hao Zhang, Hongyang Li, Feng Li et al.

ECCV 2024posterarXiv:2312.02949
114
citations

Mismatch Quest: Visual and Textual Feedback for Image-Text Misalignment

Brian Gordon, Yonatan Bitton, Yonatan Shafir et al.

ECCV 2024posterarXiv:2312.03766
17
citations

NExT-Chat: An LMM for Chat, Detection and Segmentation

Ao Zhang, Yuan Yao, Wei Ji et al.

ICML 2024poster

Parallel Vertex Diffusion for Unified Visual Grounding

Authors: Zesen Cheng, Kehan Li, Peng Jin et al.

AAAI 2024paperarXiv:2303.07216

SegVG: Transferring Object Bounding Box to Segmentation for Visual Grounding

Weitai Kang, Gaowen Liu, Shah Mubarak et al.

ECCV 2024posterarXiv:2407.03200
19
citations

Unifying Visual and Vision-Language Tracking via Contrastive Learning

AAAI 2024paperarXiv:2401.11228

Visual Grounding for Object-Level Generalization in Reinforcement Learning

Haobin Jiang, Zongqing Lu

ECCV 2024posterarXiv:2408.01942
3
citations