Paper "vision-language pre-training" Papers
6 papers found
An Empirical Study of CLIP for Text-Based Person Search
Cao Min, Yang Bai, ziyin Zeng et al.
AAAI 2024paperarXiv:2308.10045
94
citations
Embracing Language Inclusivity and Diversity in CLIP through Continual Language Learning
Bang Yang, Yong Dai, Xuxin Cheng et al.
AAAI 2024paperarXiv:2401.17186
9
citations
EVE: Efficient Vision-Language Pre-training with Masked Prediction and Modality-Aware MoE
Junyi Chen, Longteng Guo, Jia Sun et al.
AAAI 2024paperarXiv:2308.11971
20
citations
GroundVLP: Harnessing Zero-Shot Visual Grounding from Vision-Language Pre-training and Open-Vocabulary Object Detection
Haozhan Shen, Tiancheng Zhao, Mingwei Zhu et al.
AAAI 2024paperarXiv:2312.15043
Structure-CLIP: Towards Scene Graph Knowledge to Enhance Multi-Modal Structured Representations
Yufeng Huang, Jiji Tang, Zhuo Chen et al.
AAAI 2024paperarXiv:2305.06152
49
citations
TiMix: Text-Aware Image Mixing for Effective Vision-Language Pre-training
Chaoya Jiang, Wei Ye, Haiyang Xu et al.
AAAI 2024paperarXiv:2312.08846
6
citations