Highlight "multimodal learning" Papers
3 papers found
Conference
Multimodal Autoregressive Pre-training of Large Vision Encoders
Enrico Fini, Mustafa Shukor, Xiujun Li et al.
CVPR 2025highlightarXiv:2411.14402
77
citations
SmartCLIP: Modular Vision-language Alignment with Identification Guarantees
Shaoan Xie, Lingjing Kong, Yujia Zheng et al.
CVPR 2025highlightarXiv:2507.22264
4
citations
Tri-Modal Motion Retrieval by Learning a Joint Embedding Space
Kangning Yin, Shihao Zou, Yuxuan Ge et al.
CVPR 2024highlightarXiv:2403.00691
15
citations