Xiaojian Ma
12
Papers
145
Total Citations
Papers (12)
CLOVA: A Closed-LOop Visual Assistant with Tool Usage and Update
CVPR 2024
45
citations
Multi-modal Agent Tuning: Building a VLM-Driven Agent for Efficient Tool Usage
ICLR 2025
37
citations
Move to Understand a 3D Scene: Bridging Visual Grounding and Exploration for Efficient and Versatile Embodied Navigation
ICCV 2025arXiv
24
citations
Bongard-OpenWorld: Few-Shot Reasoning for Free-form Visual Concepts in the Real World
ICLR 2024
17
citations
ROCKET-1: Mastering Open-World Interaction with Visual-Temporal Context Prompting
CVPR 2025
11
citations
Embodied VideoAgent: Persistent Memory from Egocentric Videos and Embodied Sensors Enables Dynamic Scene Understanding
ICCV 2025
11
citations
An Embodied Generalist Agent in 3D World
ICML 2024
0
citations
Bongard-HOI: Benchmarking Few-Shot Visual Reasoning for Human-Object Interactions
CVPR 2022
0
citations
Open-World Multi-Task Control Through Goal-Aware Representation Learning and Adaptive Horizon Prediction
CVPR 2023arXiv
0
citations
3D-VisTA: Pre-trained Transformer for 3D Vision and Text Alignment
ICCV 2023
0
citations
Imitation Learning from Observations by Minimizing Inverse Dynamics Disagreement
NeurIPS 2019
0
citations
Unsupervised Foreground Extraction via Deep Region Competition
NeurIPS 2021
0
citations