ICLR 2025 "multimodal interaction" Papers
4 papers found
Draw-and-Understand: Leveraging Visual Prompts to Enable MLLMs to Comprehend What You Want
Weifeng Lin, Xinyu Wei, Ruichuan An et al.
ICLR 2025posterarXiv:2403.20271
86
citations
Lightweight Neural App Control
Filippos Christianos, Georgios Papoudakis, Thomas Coste et al.
ICLR 2025posterarXiv:2410.17883
10
citations
OVTR: End-to-End Open-Vocabulary Multiple Object Tracking with Transformer
Jinyang Li, En Yu, Sijia Chen et al.
ICLR 2025posterarXiv:2503.10616
7
citations
VLAS: Vision-Language-Action Model with Speech Instructions for Customized Robot Manipulation
Wei Zhao, Pengxiang Ding, Zhang Min et al.
ICLR 2025posterarXiv:2502.13508
37
citations