ICLR 2025 "vision-language models" Papers
14 papers found
$\mathbb{X}$-Sample Contrastive Loss: Improving Contrastive Learning with Sample Similarity Graphs
Vlad Sobal, Mark Ibrahim, Randall Balestriero et al.
ICLR 2025posterarXiv:2407.18134
12
citations
Aligning Visual Contrastive learning models via Preference Optimization
Amirabbas Afzali, Borna khodabandeh, Ali Rasekh et al.
ICLR 2025posterarXiv:2411.08923
3
citations
CogCoM: A Visual Language Model with Chain-of-Manipulations Reasoning
Ji Qi, Ming Ding, Weihan Wang et al.
ICLR 2025posterarXiv:2402.04236
33
citations
Cross the Gap: Exposing the Intra-modal Misalignment in CLIP via Modality Inversion
Marco Mistretta, Alberto Baldrati, Lorenzo Agnolucci et al.
ICLR 2025posterarXiv:2502.04263
15
citations
DAMO: Decoding by Accumulating Activations Momentum for Mitigating Hallucinations in Vision-Language Models
Kaishen Wang, Hengrui Gu, Meijun Gao et al.
ICLR 2025poster
7
citations
Do Vision-Language Models Represent Space and How? Evaluating Spatial Frame of Reference under Ambiguities
Zheyuan Zhang, Fengyuan Hu, Jayjun Lee et al.
ICLR 2025posterarXiv:2410.17385
40
citations
Enhancing Cognition and Explainability of Multimodal Foundation Models with Self-Synthesized Data
Yucheng Shi, Quanzheng Li, Jin Sun et al.
ICLR 2025posterarXiv:2502.14044
6
citations
MIA-DPO: Multi-Image Augmented Direct Preference Optimization For Large Vision-Language Models
Ziyu Liu, Yuhang Zang, Xiaoyi Dong et al.
ICLR 2025posterarXiv:2410.17637
19
citations
MRAG-Bench: Vision-Centric Evaluation for Retrieval-Augmented Multimodal Models
Wenbo Hu, Jia-Chen Gu, Zi-Yi Dou et al.
ICLR 2025posterarXiv:2410.08182
29
citations
RA-TTA: Retrieval-Augmented Test-Time Adaptation for Vision-Language Models
Youngjun Lee, Doyoung Kim, Junhyeok Kang et al.
ICLR 2025poster
5
citations
Reflexive Guidance: Improving OoDD in Vision-Language Models via Self-Guided Image-Adaptive Concept Generation
Jihyo Kim, Seulbi Lee, Sangheum Hwang
ICLR 2025posterarXiv:2410.14975
3
citations
Semantic Temporal Abstraction via Vision-Language Model Guidance for Efficient Reinforcement Learning
Tian-Shuo Liu, Xu-Hui Liu, Ruifeng Chen et al.
ICLR 2025oral
Teaching Human Behavior Improves Content Understanding Abilities Of VLMs
SOMESH SINGH, Harini S I, Yaman Singla et al.
ICLR 2025poster
2
citations
VisRAG: Vision-based Retrieval-augmented Generation on Multi-modality Documents
Shi Yu, Chaoyue Tang, Bokai Xu et al.
ICLR 2025posterarXiv:2410.10594
121
citations