2024 "visual grounding" Papers
17 papers found
An Efficient and Effective Transformer Decoder-Based Framework for Multi-Task Visual Grounding
Wei Chen, Long Chen, Yu Wu
ECCV 2024posterarXiv:2408.01120
16
citations
Cycle-Consistency Learning for Captioning and Grounding
Ning Wang, Jiajun Deng, Mingbo Jia
AAAI 2024paperarXiv:2312.15162
13
citations
DetToolChain: A New Prompting Paradigm to Unleash Detection Ability of MLLM
Yixuan Wu, Yizhou Wang, Shixiang Tang et al.
ECCV 2024posterarXiv:2403.12488
47
citations
Exploring Phrase-Level Grounding with Text-to-Image Diffusion Model
Danni Yang, Ruohan Dong, Jiayi Ji et al.
ECCV 2024posterarXiv:2407.05352
9
citations
Griffon: Spelling out All Object Locations at Any Granularity with Large Language Models
Yufei Zhan, Yousong Zhu, Zhiyang Chen et al.
ECCV 2024posterarXiv:2311.14552
30
citations
Groma: Localized Visual Tokenization for Grounding Multimodal Large Language Models
Chuofan Ma, Yi Jiang, Jiannan Wu et al.
ECCV 2024posterarXiv:2404.13013
107
citations
GroundVLP: Harnessing Zero-Shot Visual Grounding from Vision-Language Pre-training and Open-Vocabulary Object Detection
Haozhan Shen, Tiancheng Zhao, Mingwei Zhu et al.
AAAI 2024paperarXiv:2312.15043
LLaVA-Grounding: Grounded Visual Chat with Large Multimodal Models
Hao Zhang, Hongyang Li, Feng Li et al.
ECCV 2024posterarXiv:2312.02949
114
citations
Look Hear: Gaze Prediction for Speech-directed Human Attention
Sounak Mondal, Seoyoung Ahn, Zhibo Yang et al.
ECCV 2024posterarXiv:2407.19605
3
citations
Mismatch Quest: Visual and Textual Feedback for Image-Text Misalignment
Brian Gordon, Yonatan Bitton, Yonatan Shafir et al.
ECCV 2024posterarXiv:2312.03766
17
citations
NExT-Chat: An LMM for Chat, Detection and Segmentation
Ao Zhang, Yuan Yao, Wei Ji et al.
ICML 2024posterarXiv:2311.04498
Open-Vocabulary 3D Semantic Segmentation with Text-to-Image Diffusion Models
Xiaoyu Zhu, Hao Zhou, Pengfei Xing et al.
ECCV 2024posterarXiv:2407.13642
11
citations
Parallel Vertex Diffusion for Unified Visual Grounding
Authors: Zesen Cheng, Kehan Li, Peng Jin et al.
AAAI 2024paperarXiv:2303.07216
PartGLEE: A Foundation Model for Recognizing and Parsing Any Objects
Junyi Li, Junfeng Wu, Weizhi Zhao et al.
ECCV 2024posterarXiv:2407.16696
13
citations
SegVG: Transferring Object Bounding Box to Segmentation for Visual Grounding
Weitai Kang, Gaowen Liu, Shah Mubarak et al.
ECCV 2024posterarXiv:2407.03200
19
citations
Unifying Visual and Vision-Language Tracking via Contrastive Learning
AAAI 2024paperarXiv:2401.11228
Visual Grounding for Object-Level Generalization in Reinforcement Learning
Haobin Jiang, Zongqing Lu
ECCV 2024posterarXiv:2408.01942
3
citations