2025 Oral "modality fusion" Papers
2 papers found
CogVideoX: Text-to-Video Diffusion Models with An Expert Transformer
Zhuoyi Yang, Jiayan Teng, Wendi Zheng et al.
ICLR 2025oralarXiv:2408.06072
1355
citations
Watch and Listen: Understanding Audio-Visual-Speech Moments with Multimodal LLM
Zinuo Li, Xian Zhang, Yongxin Guo et al.
NeurIPS 2025oralarXiv:2505.18110
3
citations