"multimodal pre-training" Papers
4 papers found
LiveCC: Learning Video LLM with Streaming Speech Transcription at Scale
Joya Chen, Yiqi Lin, Ziyun Zeng et al.
CVPR 2025posterarXiv:2504.16030
4
citations
Should VLMs be Pre-trained with Image Data?
Sedrick Keh, Jean Mercat, Samir Yitzhak Gadre et al.
ICLR 2025posterarXiv:2503.07603
From Vision to Audio and Beyond: A Unified Model for Audio-Visual Representation and Generation
Kun Su, Xiulong Liu, Eli Shlizerman
ICML 2024poster
Structural Information Guided Multimodal Pre-training for Vehicle-Centric Perception
Xiao Wang, Wentao Wu, Chenglong Li et al.
AAAI 2024paperarXiv:2312.09812
7
citations