Xiaojian Ma
7
Papers
145
Total Citations
Papers (7)
CLOVA: A Closed-LOop Visual Assistant with Tool Usage and Update
CVPR 2024
45
citations
Multi-modal Agent Tuning: Building a VLM-Driven Agent for Efficient Tool Usage
ICLR 2025
37
citations
Move to Understand a 3D Scene: Bridging Visual Grounding and Exploration for Efficient and Versatile Embodied Navigation
ICCV 2025arXiv
24
citations
Bongard-OpenWorld: Few-Shot Reasoning for Free-form Visual Concepts in the Real World
ICLR 2024
17
citations
ROCKET-1: Mastering Open-World Interaction with Visual-Temporal Context Prompting
CVPR 2025
11
citations
Embodied VideoAgent: Persistent Memory from Egocentric Videos and Embodied Sensors Enables Dynamic Scene Understanding
ICCV 2025
11
citations
An Embodied Generalist Agent in 3D World
ICML 2024
0
citations