Poster "multimodal large language model" Papers
5 papers found
Ferret-UI 2: Mastering Universal User Interface Understanding Across Platforms
Zhangheng LI, Keen You, Haotian Zhang et al.
ICLR 2025posterarXiv:2410.18967
43
citations
JarvisArt: Liberating Human Artistic Creativity via an Intelligent Photo Retouching Agent
Yunlong Lin, Zixu Lin, Kunjie Lin et al.
NeurIPS 2025posterarXiv:2506.17612
9
citations
MindOmni: Unleashing Reasoning Generation in Vision Language Models with RGPO
Yicheng Xiao, Lin Song, Yukang Chen et al.
NeurIPS 2025posterarXiv:2505.13031
18
citations
Omni-RGPT: Unifying Image and Video Region-level Understanding via Token Marks
Miran Heo, Min-Hung Chen, De-An Huang et al.
CVPR 2025posterarXiv:2501.08326
9
citations
Referring to Any Person
Qing Jiang, Lin Wu, Zhaoyang Zeng et al.
ICCV 2025posterarXiv:2503.08507
13
citations