2024 "vision-language pre-training" Papers
10 papers found
An Empirical Study of CLIP for Text-Based Person Search
Cao Min, Yang Bai, ziyin Zeng et al.
AAAI 2024paperarXiv:2308.10045
94
citations
Boosting Transferability in Vision-Language Attacks via Diversification along the Intersection Region of Adversarial Trajectory
Sensen Gao, Xiaojun Jia, Xuhong Ren et al.
ECCV 2024posterarXiv:2403.12445
31
citations
Class-Incremental Learning with CLIP: Adaptive Representation Adjustment and Parameter Fusion
Linlan Huang, Xusheng Cao, Haori Lu et al.
ECCV 2024posterarXiv:2407.14143
41
citations
Dense Multimodal Alignment for Open-Vocabulary 3D Scene Understanding
Ruihuang Li, Zhengqiang ZHANG, Chenhang He et al.
ECCV 2024posterarXiv:2407.09781
11
citations
Embracing Language Inclusivity and Diversity in CLIP through Continual Language Learning
Bang Yang, Yong Dai, Xuxin Cheng et al.
AAAI 2024paperarXiv:2401.17186
9
citations
EVE: Efficient Vision-Language Pre-training with Masked Prediction and Modality-Aware MoE
Junyi Chen, Longteng Guo, Jia Sun et al.
AAAI 2024paperarXiv:2308.11971
20
citations
GroundVLP: Harnessing Zero-Shot Visual Grounding from Vision-Language Pre-training and Open-Vocabulary Object Detection
Haozhan Shen, Tiancheng Zhao, Mingwei Zhu et al.
AAAI 2024paperarXiv:2312.15043
Online Zero-Shot Classification with CLIP
Qi Qian, JUHUA HU
ECCV 2024posterarXiv:2408.13320
21
citations
Structure-CLIP: Towards Scene Graph Knowledge to Enhance Multi-Modal Structured Representations
Yufeng Huang, Jiji Tang, Zhuo Chen et al.
AAAI 2024paperarXiv:2305.06152
49
citations
TiMix: Text-Aware Image Mixing for Effective Vision-Language Pre-training
Chaoya Jiang, Wei Ye, Haiyang Xu et al.
AAAI 2024paperarXiv:2312.08846
6
citations