2025 Poster "multi-modal understanding" Papers
4 papers found
Audio-Visual Instance Segmentation
Ruohao Guo, Xianghua Ying, Yaru Chen et al.
CVPR 2025posterarXiv:2310.18709
11
citations
BACON: Improving Clarity of Image Captions via Bag-of-Concept Graphs
Zhantao Yang, Ruili Feng, Keyu Yan et al.
CVPR 2025posterarXiv:2407.03314
3
citations
SSR: Enhancing Depth Perception in Vision-Language Models via Rationale-Guided Spatial Reasoning
Yang Liu, Ming Ma, Xiaomin Yu et al.
NEURIPS 2025posterarXiv:2505.12448
19
citations
VGGSounder: Audio-Visual Evaluations for Foundation Models
Daniil Zverev, Thaddäus Wiedemer, Ameya Prabhu et al.
ICCV 2025posterarXiv:2508.08237
4
citations