Ruimao Zhang
12
Papers
1,136
Total Citations
Papers (12)
WorldSimBench: Towards Video Generation Models as World Simulators
ICML 2025
806
citations
SmartEdit: Exploring Complex Instruction-based Image Editing with Multimodal Large Language Models
CVPR 2024
139
citations
MP5: A Multi-modal Open-ended Embodied System in Minecraft via Active Perception
CVPR 2024
76
citations
Open-World Human-Object Interaction Detection via Multi-modal Prompts
CVPR 2024
31
citations
ScaMo: Exploring the Scaling Law in Autoregressive Motion Generation Model
CVPR 2025
24
citations
F-HOI: Toward Fine-grained Semantic-Aligned 3D Human-Object Interactions
ECCV 2024
22
citations
X4D-SceneFormer: Enhanced Scene Understanding on 4D Point Cloud Videos through Cross-Modal Knowledge Transfer
AAAI 2024arXiv
11
citations
RoboFactory: Exploring Embodied Agent Collaboration with Compositional Constraints
ICCV 2025
11
citations
DriveGEN: Generalized and Robust 3D Detection in Driving via Controllable Text-to-Image Diffusion Generation
CVPR 2025
10
citations
FreeMan: Towards Benchmarking 3D Human Pose Estimation under Real-World Conditions
CVPR 2024
6
citations
SEED-Bench: Benchmarking Multimodal Large Language Models
CVPR 2024
0
citations
HumanTOMATO: Text-aligned Whole-body Motion Generation
ICML 2024
0
citations