"vision-language fusion" Papers
2 papers found
Accelerating Multimodal Large Language Models via Dynamic Visual-Token Exit and the Empirical Findings
Qiong Wu, Wenhao Lin, Yiyi Zhou et al.
NeurIPS 2025posterarXiv:2411.19628
5
citations
Multi-Modal Latent Space Learning for Chain-of-Thought Reasoning in Language Models
Liqi He, Zuchao Li, Xiantao Cai et al.
AAAI 2024paperarXiv:2312.08762
34
citations