CVPR Poster "multimodal large language model" Papers
2 papers found
Omni-RGPT: Unifying Image and Video Region-level Understanding via Token Marks
Miran Heo, Min-Hung Chen, De-An Huang et al.
CVPR 2025posterarXiv:2501.08326
9
citations
Optimus-2: Multimodal Minecraft Agent with Goal-Observation-Action Conditioned Policy
Zaijing Li, Yuquan Xie, Rui Shao et al.
CVPR 2025posterarXiv:2502.19902
21
citations