"multimodal large language model" Papers
5 papers found
JarvisArt: Liberating Human Artistic Creativity via an Intelligent Photo Retouching Agent
Yunlong Lin, Zixu Lin, Kunjie Lin et al.
NeurIPS 2025posterarXiv:2506.17612
9
citations
JavisGPT: A Unified Multi-modal LLM for Sounding-Video Comprehension and Generation
Kai Liu, Jungang Li, Yuchong Sun et al.
NeurIPS 2025oralarXiv:2512.22905
4
citations
MindOmni: Unleashing Reasoning Generation in Vision Language Models with RGPO
Yicheng Xiao, Lin Song, Yukang Chen et al.
NeurIPS 2025posterarXiv:2505.13031
18
citations
Omni-RGPT: Unifying Image and Video Region-level Understanding via Token Marks
Miran Heo, Min-Hung Chen, De-An Huang et al.
CVPR 2025posterarXiv:2501.08326
9
citations
Referring to Any Person
Qing Jiang, Lin Wu, Zhaoyang Zeng et al.
ICCV 2025posterarXiv:2503.08507
13
citations