Hengshuang Zhao

31
Papers
527
Total Citations

Papers (31)

Pixel-GS Density Control with Pixel-aware Gradient for 3D Gaussian Splatting

ECCV 2024
96
citations

Towards Large-scale 3D Representation Learning with Multi-dataset Point Prompt Training

CVPR 2024
77
citations

UniReal: Universal Image Generation and Editing via Learning Real-world Dynamics

CVPR 2025
70
citations

GPT4Point: A Unified Framework for Point-Language Understanding and Generation

CVPR 2024
62
citations

EMOVA: Empowering Language Models to See, Hear and Speak with Vivid Emotions

CVPR 2025
44
citations

Sonata: Self-Supervised Learning of Reliable Point Representations

CVPR 2025
39
citations

Orient Anything: Learning Robust Object Orientation Estimation from Rendering 3D Models

ICML 2025
28
citations

GroupContrast: Semantic-aware Self-supervised Representation Learning for 3D Understanding

CVPR 2024
25
citations

DreamComposer: Controllable 3D Object Generation via Multi-View Conditions

CVPR 2024
19
citations

DriveGPT4-V2: Harnessing Large Language Model Capabilities for Enhanced Closed-Loop Autonomous Driving

CVPR 2025
17
citations

ViLLa: Video Reasoning Segmentation with Large Language Model

ICCV 2025
16
citations

HiRes-LLaVA: Restoring Fragmentation Input in High-Resolution Large Vision-Language Models

CVPR 2025
13
citations

HaploVL: A Single-Transformer Baseline for Multi-Modal Understanding

ICML 2025
6
citations

ROSE: Remove Objects with Side Effects in Videos

NeurIPS 2025
4
citations

Empowering Large Language Models with 3D Situation Awareness

CVPR 2025
3
citations

PlayerOne: Egocentric World Simulator

NeurIPS 2025
3
citations

LiteReality: Graphic-Ready 3D Scene Reconstruction from RGB-D Scans

NeurIPS 2025
3
citations

BOOD: Boundary-based Out-Of-Distribution Data Generation

ICML 2025
2
citations

UniMODE: Unified Monocular 3D Object Detection

CVPR 2024
0
citations

Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data

CVPR 2024
0
citations

Visual Programming for Zero-shot Open-Vocabulary 3D Visual Grounding

CVPR 2024
0
citations

DiffDoctor: Diagnosing Image Diffusion Models Before Treating

ICCV 2025
0
citations

StableDepth: Scene-Consistent and Scale-Invariant Monocular Depth

ICCV 2025
0
citations

DisCo: Towards Distinct and Coherent Visual Encapsulation in Video MLLMs

ICCV 2025
0
citations

HERMES: A Unified Self-Driving World Model for Simultaneous 3D Scene Understanding and Generation

ICCV 2025
0
citations

SpatialCLIP: Learning 3D-aware Image Representations from Spatially Discriminative Language

CVPR 2025
0
citations

AnyDoor: Zero-shot Object-level Image Customization

CVPR 2024
0
citations

PanDA: Towards Panoramic Depth Anything with Unlabeled Panoramas and Mobius Spatial Augmentation

CVPR 2025
0
citations

UniPAD: A Universal Pre-training Paradigm for Autonomous Driving

CVPR 2024
0
citations

OA-CNNs: Omni-Adaptive Sparse CNNs for 3D Semantic Segmentation

CVPR 2024
0
citations

Point Transformer V3: Simpler Faster Stronger

CVPR 2024
0
citations