Hengshuang Zhao

31

Papers

527

Total Citations

Papers (31)

Pixel-GS Density Control with Pixel-aware Gradient for 3D Gaussian Splatting

Towards Large-scale 3D Representation Learning with Multi-dataset Point Prompt Training

UniReal: Universal Image Generation and Editing via Learning Real-world Dynamics

GPT4Point: A Unified Framework for Point-Language Understanding and Generation

EMOVA: Empowering Language Models to See, Hear and Speak with Vivid Emotions

Sonata: Self-Supervised Learning of Reliable Point Representations

Orient Anything: Learning Robust Object Orientation Estimation from Rendering 3D Models

GroupContrast: Semantic-aware Self-supervised Representation Learning for 3D Understanding

DreamComposer: Controllable 3D Object Generation via Multi-View Conditions

DriveGPT4-V2: Harnessing Large Language Model Capabilities for Enhanced Closed-Loop Autonomous Driving

ViLLa: Video Reasoning Segmentation with Large Language Model

HiRes-LLaVA: Restoring Fragmentation Input in High-Resolution Large Vision-Language Models

HaploVL: A Single-Transformer Baseline for Multi-Modal Understanding

ROSE: Remove Objects with Side Effects in Videos

Empowering Large Language Models with 3D Situation Awareness

PlayerOne: Egocentric World Simulator

LiteReality: Graphic-Ready 3D Scene Reconstruction from RGB-D Scans

BOOD: Boundary-based Out-Of-Distribution Data Generation

UniMODE: Unified Monocular 3D Object Detection

Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data

Visual Programming for Zero-shot Open-Vocabulary 3D Visual Grounding

DiffDoctor: Diagnosing Image Diffusion Models Before Treating

StableDepth: Scene-Consistent and Scale-Invariant Monocular Depth

DisCo: Towards Distinct and Coherent Visual Encapsulation in Video MLLMs

HERMES: A Unified Self-Driving World Model for Simultaneous 3D Scene Understanding and Generation

SpatialCLIP: Learning 3D-aware Image Representations from Spatially Discriminative Language

AnyDoor: Zero-shot Object-level Image Customization

PanDA: Towards Panoramic Depth Anything with Unlabeled Panoramas and Mobius Spatial Augmentation

UniPAD: A Universal Pre-training Paradigm for Autonomous Driving

OA-CNNs: Omni-Adaptive Sparse CNNs for 3D Semantic Segmentation

Point Transformer V3: Simpler Faster Stronger