Ruimao Zhang

12

Papers

1,136

Total Citations

Papers (12)

WorldSimBench: Towards Video Generation Models as World Simulators

SmartEdit: Exploring Complex Instruction-based Image Editing with Multimodal Large Language Models

MP5: A Multi-modal Open-ended Embodied System in Minecraft via Active Perception

Open-World Human-Object Interaction Detection via Multi-modal Prompts

ScaMo: Exploring the Scaling Law in Autoregressive Motion Generation Model

F-HOI: Toward Fine-grained Semantic-Aligned 3D Human-Object Interactions

X4D-SceneFormer: Enhanced Scene Understanding on 4D Point Cloud Videos through Cross-Modal Knowledge Transfer

RoboFactory: Exploring Embodied Agent Collaboration with Compositional Constraints

DriveGEN: Generalized and Robust 3D Detection in Driving via Controllable Text-to-Image Diffusion Generation

FreeMan: Towards Benchmarking 3D Human Pose Estimation under Real-World Conditions

SEED-Bench: Benchmarking Multimodal Large Language Models

HumanTOMATO: Text-aligned Whole-body Motion Generation