Xiaojian Ma

7

Papers

145

Total Citations

Papers (7)

CLOVA: A Closed-LOop Visual Assistant with Tool Usage and Update

Multi-modal Agent Tuning: Building a VLM-Driven Agent for Efficient Tool Usage

Move to Understand a 3D Scene: Bridging Visual Grounding and Exploration for Efficient and Versatile Embodied Navigation

Bongard-OpenWorld: Few-Shot Reasoning for Free-form Visual Concepts in the Real World

ROCKET-1: Mastering Open-World Interaction with Visual-Temporal Context Prompting

Embodied VideoAgent: Persistent Memory from Egocentric Videos and Embodied Sensors Enables Dynamic Scene Understanding

An Embodied Generalist Agent in 3D World