2025 Poster "referring expression comprehension" Papers
6 papers found
Fine-grained Spatiotemporal Grounding on Egocentric Videos
Shuo LIANG, Yiwu Zhong, Zi-Yuan Hu et al.
ICCV 2025posterarXiv:2508.00518
5
citations
From Objects to Anywhere: A Holistic Benchmark for Multi-level Visual Grounding in 3D Scenes
Tianxu Wang, Zhuofan Zhang, Ziyu Zhu et al.
NeurIPS 2025posterarXiv:2506.04897
Ground-V: Teaching VLMs to Ground Complex Instructions in Pixels
Yongshuo Zong, Qin ZHANG, DONGSHENG An et al.
CVPR 2025posterarXiv:2505.13788
3
citations
Omni-RGPT: Unifying Image and Video Region-level Understanding via Token Marks
Miran Heo, Min-Hung Chen, De-An Huang et al.
CVPR 2025posterarXiv:2501.08326
9
citations
Referring to Any Person
Qing Jiang, Lin Wu, Zhaoyang Zeng et al.
ICCV 2025posterarXiv:2503.08507
13
citations
Vision-Language Models Can't See the Obvious
YASSER ABDELAZIZ DAHOU DJILALI, Ngoc Huynh, Phúc Lê Khắc et al.
ICCV 2025posterarXiv:2507.04741
7
citations