CVPR 2025 "multimodal understanding" Papers
3 papers found
CoMM: A Coherent Interleaved Image-Text Dataset for Multimodal Understanding and Generation
Wei Chen, Lin Li, Yongqi Yang et al.
CVPR 2025highlightarXiv:2406.10462
12
citations
OpenING: A Comprehensive Benchmark for Judging Open-ended Interleaved Image-Text Generation
Pengfei Zhou, Xiaopeng Peng, Jiajun Song et al.
CVPR 2025posterarXiv:2411.18499
19
citations
TokenFlow: Unified Image Tokenizer for Multimodal Understanding and Generation
Liao Qu, Huichao Zhang, Yiheng Liu et al.
CVPR 2025posterarXiv:2412.03069
120
citations