"image-text retrieval" Papers
9 papers found
AmorLIP: Efficient Language-Image Pretraining via Amortization
Haotian Sun, Yitong Li, Yuchen Zhuang et al.
NeurIPS 2025posterarXiv:2505.18983
2
citations
Harnessing Frozen Unimodal Encoders for Flexible Multimodal Alignment
Mayug Maniparambil, Raiymbek Akshulakov, YASSER ABDELAZIZ DAHOU DJILALI et al.
CVPR 2025posterarXiv:2409.19425
2
citations
Semi-Supervised CLIP Adaptation by Enforcing Semantic and Trapezoidal Consistency
Kai Gan, Bo Ye, Min-Ling Zhang et al.
ICLR 2025poster
3
citations
VladVA: Discriminative Fine-tuning of LVLMs
Yassine Ouali, Adrian Bulat, ALEXANDROS XENOS et al.
CVPR 2025posterarXiv:2412.04378
11
citations
CrossGET: Cross-Guided Ensemble of Tokens for Accelerating Vision-Language Transformers
Dachuan Shi, Chaofan Tao, Anyi Rao et al.
ICML 2024poster
EVE: Efficient Vision-Language Pre-training with Masked Prediction and Modality-Aware MoE
Junyi Chen, Longteng Guo, Jia Sun et al.
AAAI 2024paperarXiv:2308.11971
20
citations
Multi-Label Cluster Discrimination for Visual Representation Learning
Xiang An, Kaicheng Yang, Xiangzi Dai et al.
ECCV 2024posterarXiv:2407.17331
12
citations
Revisiting the Role of Language Priors in Vision-Language Models
Zhiqiu Lin, Xinyue Chen, Deepak Pathak et al.
ICML 2024poster
SyCoCa: Symmetrizing Contrastive Captioners with Attentive Masking for Multimodal Alignment
Ziping Ma, Furong Xu, Jian liu et al.
ICML 2024poster