Siyuan Huang

27
Papers
364
Total Citations

Papers (27)

GUIOdyssey: A Comprehensive Dataset for Cross-App GUI Navigation on Mobile Devices

ICCV 2025
96
citations

Move as You Say Interact as You Can: Language-guided Human Motion Generation with Scene Affordance

CVPR 2024
78
citations

EnerVerse: Envisioning Embodied Future Space for Robotics Manipulation

NeurIPS 2025
34
citations

PixWizard: Versatile Image-to-Image Visual Assistant with Open-Language Instructions

ICLR 2025
26
citations

Move to Understand a 3D Scene: Bridging Visual Grounding and Exploration for Efficient and Versatile Embodied Navigation

ICCV 2025arXiv
24
citations

F-HOI: Toward Fine-grained Semantic-Aligned 3D Human-Object Interactions

ECCV 2024
22
citations

Decompositional Neural Scene Reconstruction with Generative Diffusion Prior

CVPR 2025
18
citations

Unveiling the Mist over 3D Vision-Language Understanding: Object-centric Evaluation with Chain-of-Analysis

CVPR 2025
17
citations

Neural-Symbolic Recursive Machine for Systematic Generalization

ICLR 2024
14
citations

TACO: Taming Diffusion for in-the-wild Video Amodal Completion

ICCV 2025
9
citations

SceneWeaver: All-in-One 3D Scene Synthesis with an Extensible and Self-Reflective Agent

NeurIPS 2025
8
citations

Trace3D: Consistent Segmentation Lifting via Gaussian Instance Tracing

ICCV 2025
7
citations

InteractAnything: Zero-shot Human Object Interaction Synthesis via LLM Feedback and Object Affordance Parsing

CVPR 2025
6
citations

Object-Centric Prompt-Driven Vision-Language-Action Model for Robotic Manipulation

CVPR 2025arXiv
4
citations

PrimHOI: Compositional Human-Object Interaction via Reusable Primitives

ICCV 2025
1
citations

Masked Point-Entity Contrast for Open-Vocabulary 3D Scene Understanding

CVPR 2025
0
citations

Scaling Up Dynamic Human-Scene Interaction Modeling

CVPR 2024
0
citations

PhyScene: Physically Interactable 3D Scene Synthesis for Embodied AI

CVPR 2024
0
citations

An Embodied Generalist Agent in 3D World

ICML 2024
0
citations

ManipTrans: Efficient Dexterous Bimanual Manipulation Transfer via Residual Learning

CVPR 2025
0
citations

SPHINX-X: Scaling Data and Parameters for a Family of Multi-modal Large Language Models

ICML 2024
0
citations

GROVE: A Generalized Reward for Learning Open-Vocabulary Physical Skill

CVPR 2025
0
citations

Dynamic Motion Blending for Versatile Motion Editing

CVPR 2025
0
citations

GWM: Towards Scalable Gaussian World Models for Robotic Manipulation

ICCV 2025
0
citations

METASCENES: Towards Automated Replica Creation for Real-world 3D Scans

CVPR 2025
0
citations

MOVIS: Enhancing Multi-Object Novel View Synthesis for Indoor Scenes

CVPR 2025
0
citations

AnySkill: Learning Open-Vocabulary Physical Skill for Interactive Agents

CVPR 2024
0
citations