"cross-modal generation" Papers
4 papers found
Flowing from Words to Pixels: A Noise-Free Framework for Cross-Modality Evolution
Qihao Liu, Xi Yin, Alan L. Yuille et al.
CVPR 2025highlightarXiv:2412.15213
10
citations
HMVLM:Human Motion-Vision-Language Model via MoE LoRA
Lei Hu, Yongjing Ye, Shihong Xia
NeurIPS 2025poster
VisRAG: Vision-based Retrieval-augmented Generation on Multi-modality Documents
Shi Yu, Chaoyue Tang, Bokai Xu et al.
ICLR 2025posterarXiv:2410.10594
121
citations
V2A-Mapper: A Lightweight Solution for Vision-to-Audio Generation by Connecting Foundation Models
Heng Wang, Jianbo Ma, Santiago Pascual et al.
AAAI 2024paperarXiv:2308.09300
74
citations