Zhe Gan
5
Papers
107
Total Citations
Papers (5)
Ferret-UI 2: Mastering Universal User Interface Understanding Across Platforms
ICLR 2025arXiv
43
citations
MIA-Bench: Towards Better Instruction Following Evaluation of Multimodal LLMs
ICLR 2025
41
citations
From Multimodal LLMs to Generalist Embodied Agents: Methods and Lessons
CVPR 2025
23
citations
Multimodal Autoregressive Pre-training of Large Vision Encoders
CVPR 2025
0
citations
UniVG: A Generalist Diffusion Model for Unified Image Generation and Editing
ICCV 2025
0
citations