2025 "vision-language integration" Papers
3 papers found
Crafting Dynamic Virtual Activities with Advanced Multimodal Models
Changyang Li, Qingan Yan, Minyoung Kim et al.
ISMAR 2025paperarXiv:2406.17582
EVEv2: Improved Baselines for Encoder-Free Vision-Language Models
Haiwen Diao, Xiaotong Li, Yufeng Cui et al.
ICCV 2025highlightarXiv:2502.06788
18
citations
mPLUG-Owl3: Towards Long Image-Sequence Understanding in Multi-Modal Large Language Models
Jiabo Ye, Haiyang Xu, Haowei Liu et al.
ICLR 2025posterarXiv:2408.04840
237
citations