Siyuan Huang
51
Papers
364
Total Citations
Papers (51)
GUIOdyssey: A Comprehensive Dataset for Cross-App GUI Navigation on Mobile Devices
ICCV 2025
96
citations
Move as You Say Interact as You Can: Language-guided Human Motion Generation with Scene Affordance
CVPR 2024
78
citations
EnerVerse: Envisioning Embodied Future Space for Robotics Manipulation
NeurIPS 2025
34
citations
PixWizard: Versatile Image-to-Image Visual Assistant with Open-Language Instructions
ICLR 2025
26
citations
Move to Understand a 3D Scene: Bridging Visual Grounding and Exploration for Efficient and Versatile Embodied Navigation
ICCV 2025arXiv
24
citations
F-HOI: Toward Fine-grained Semantic-Aligned 3D Human-Object Interactions
ECCV 2024
22
citations
Decompositional Neural Scene Reconstruction with Generative Diffusion Prior
CVPR 2025
18
citations
Unveiling the Mist over 3D Vision-Language Understanding: Object-centric Evaluation with Chain-of-Analysis
CVPR 2025
17
citations
Neural-Symbolic Recursive Machine for Systematic Generalization
ICLR 2024
14
citations
TACO: Taming Diffusion for in-the-wild Video Amodal Completion
ICCV 2025
9
citations
SceneWeaver: All-in-One 3D Scene Synthesis with an Extensible and Self-Reflective Agent
NeurIPS 2025
8
citations
Trace3D: Consistent Segmentation Lifting via Gaussian Instance Tracing
ICCV 2025
7
citations
InteractAnything: Zero-shot Human Object Interaction Synthesis via LLM Feedback and Object Affordance Parsing
CVPR 2025
6
citations
Object-Centric Prompt-Driven Vision-Language-Action Model for Robotic Manipulation
CVPR 2025arXiv
4
citations
PrimHOI: Compositional Human-Object Interaction via Reusable Primitives
ICCV 2025
1
citations
Infrared Invisible Clothing: Hiding From Infrared Detectors at Multiple Angles in Real World
CVPR 2022arXiv
0
citations
Adversarial Texture for Fooling Person Detectors in the Physical World
CVPR 2022arXiv
0
citations
Prompt, Generate, Then Cache: Cascade of Foundation Models Makes Strong Few-Shot Learners
CVPR 2023arXiv
0
citations
GAPartNet: Cross-Category Domain-Generalizable Object Perception and Manipulation via Generalizable and Actionable Parts
CVPR 2023arXiv
0
citations
Diffusion-Based Generation, Optimization, and Planning in 3D Scenes
CVPR 2023arXiv
0
citations
Predicting Human Activities Using Stochastic Grammar
ICCV 2017arXiv
0
citations
Understanding Human Gaze Communication by Spatio-Temporal Graph Reasoning
ICCV 2019
0
citations
Holistic++ Scene Understanding: Single-View 3D Holistic Scene Parsing and Human Pose Estimation With Human-Object Interaction and Physical Commonsense
ICCV 2019
0
citations
YouRefIt: Embodied Reference Understanding With Language and Gesture
ICCV 2021arXiv
0
citations
VLGrammar: Grounded Grammar Induction of Vision and Language
ICCV 2021arXiv
0
citations
3D-VisTA: Pre-trained Transformer for 3D Vision and Text Alignment
ICCV 2023
0
citations
ARNOLD: A Benchmark for Language-Grounded Task Learning with Continuous States in Realistic 3D Scenes
ICCV 2023arXiv
0
citations
Full-Body Articulated Human-Object Interaction
ICCV 2023arXiv
0
citations
A Competence-aware Curriculum for Visual Concepts Learning via Question Answering
ECCV 2020
0
citations
LEMMA: A Multi-view Dataset for LEarning Multi-agent Multi-task Activities
ECCV 2020
0
citations
Spatio-Temporal Self-Supervised Representation Learning for 3D Point Clouds
ICCV 2021arXiv
0
citations
Masked Point-Entity Contrast for Open-Vocabulary 3D Scene Understanding
CVPR 2025
0
citations
MOVIS: Enhancing Multi-Object Novel View Synthesis for Indoor Scenes
CVPR 2025
0
citations
METASCENES: Towards Automated Replica Creation for Real-world 3D Scans
CVPR 2025
0
citations
Dynamic Motion Blending for Versatile Motion Editing
CVPR 2025
0
citations
ManipTrans: Efficient Dexterous Bimanual Manipulation Transfer via Residual Learning
CVPR 2025
0
citations
GROVE: A Generalized Reward for Learning Open-Vocabulary Physical Skill
CVPR 2025
0
citations
GWM: Towards Scalable Gaussian World Models for Robotic Manipulation
ICCV 2025
0
citations
AnySkill: Learning Open-Vocabulary Physical Skill for Interactive Agents
CVPR 2024
0
citations
Scaling Up Dynamic Human-Scene Interaction Modeling
CVPR 2024
0
citations
PhyScene: Physically Interactable 3D Scene Synthesis for Embodied AI
CVPR 2024
0
citations
An Embodied Generalist Agent in 3D World
ICML 2024
0
citations
SPHINX-X: Scaling Data and Parameters for a Family of Multi-modal Large Language Models
ICML 2024
0
citations
Human-Centric Indoor Scene Synthesis Using Stochastic Grammar
CVPR 2018arXiv
0
citations
Learning Neural Representation of Camera Pose with Matrix Representation of Pose Shift via View Synthesis
CVPR 2021arXiv
0
citations
Cooperative Holistic Scene Understanding: Unifying 3D Object, Layout, and Camera Pose Estimation
NeurIPS 2018
0
citations
PerspectiveNet: 3D Object Detection from a Single RGB Image via Perspective Points
NeurIPS 2019
0
citations
EgoTaskQA: Understanding Human Tasks in Egocentric Videos
NeurIPS 2022
0
citations
HUMANISE: Language-conditioned Human Motion Generation in 3D Scenes
NeurIPS 2022
0
citations
ProBio: A Protocol-guided Multimodal Dataset for Molecular Biology Lab
NeurIPS 2023
0
citations
Tailoring Self-Attention for Graph via Rooted Subtrees
NeurIPS 2023
0
citations