Hengshuang Zhao

65
Papers
527
Total Citations

Papers (65)

Pixel-GS Density Control with Pixel-aware Gradient for 3D Gaussian Splatting

ECCV 2024
96
citations

Towards Large-scale 3D Representation Learning with Multi-dataset Point Prompt Training

CVPR 2024
77
citations

UniReal: Universal Image Generation and Editing via Learning Real-world Dynamics

CVPR 2025
70
citations

GPT4Point: A Unified Framework for Point-Language Understanding and Generation

CVPR 2024
62
citations

EMOVA: Empowering Language Models to See, Hear and Speak with Vivid Emotions

CVPR 2025
44
citations

Sonata: Self-Supervised Learning of Reliable Point Representations

CVPR 2025
39
citations

Orient Anything: Learning Robust Object Orientation Estimation from Rendering 3D Models

ICML 2025
28
citations

GroupContrast: Semantic-aware Self-supervised Representation Learning for 3D Understanding

CVPR 2024
25
citations

DreamComposer: Controllable 3D Object Generation via Multi-View Conditions

CVPR 2024
19
citations

DriveGPT4-V2: Harnessing Large Language Model Capabilities for Enhanced Closed-Loop Autonomous Driving

CVPR 2025
17
citations

ViLLa: Video Reasoning Segmentation with Large Language Model

ICCV 2025
16
citations

HiRes-LLaVA: Restoring Fragmentation Input in High-Resolution Large Vision-Language Models

CVPR 2025
13
citations

HaploVL: A Single-Transformer Baseline for Multi-Modal Understanding

ICML 2025
6
citations

ROSE: Remove Objects with Side Effects in Videos

NeurIPS 2025
4
citations

LiteReality: Graphic-Ready 3D Scene Reconstruction from RGB-D Scans

NeurIPS 2025
3
citations

Empowering Large Language Models with 3D Situation Awareness

CVPR 2025
3
citations

PlayerOne: Egocentric World Simulator

NeurIPS 2025
3
citations

BOOD: Boundary-based Out-Of-Distribution Data Generation

ICML 2025
2
citations

Exploring Self-Attention for Image Recognition

CVPR 2020arXiv
0
citations

Distilling Knowledge via Knowledge Review

CVPR 2021arXiv
0
citations

Semi-Supervised Semantic Segmentation With Directional Context-Aware Consistency

CVPR 2021
0
citations

PAConv: Position Adaptive Convolution With Dynamic Kernel Assembling on Point Clouds

CVPR 2021arXiv
0
citations

Fully Convolutional Networks for Panoptic Segmentation

CVPR 2021arXiv
0
citations

Bidirectional Projection Network for Cross Dimension Scene Understanding

CVPR 2021arXiv
0
citations

Rethinking Semantic Segmentation From a Sequence-to-Sequence Perspective With Transformers

CVPR 2021arXiv
0
citations

FocalClick: Towards Practical Interactive Image Segmentation

CVPR 2022arXiv
0
citations

Generalized Few-Shot Semantic Segmentation

CVPR 2022arXiv
0
citations

PhysFormer: Facial Video-Based Physiological Measurement With Temporal Difference Transformer

CVPR 2022arXiv
0
citations

LAVT: Language-Aware Vision Transformer for Referring Image Segmentation

CVPR 2022arXiv
0
citations

Stratified Transformer for 3D Point Cloud Segmentation

CVPR 2022arXiv
0
citations

Masked Scene Contrast: A Scalable Framework for Unsupervised 3D Representation Learning

CVPR 2023arXiv
0
citations

Detecting Everything in the Open World: Towards Universal Object Detection

CVPR 2023arXiv
0
citations

Hierarchical Point-Edge Interaction Network for Point Cloud Semantic Segmentation

ICCV 2019
0
citations

Dynamic Divide-and-Conquer Adversarial Training for Robust Semantic Segmentation

ICCV 2021arXiv
0
citations

Point Transformer

ICCV 2021arXiv
0
citations

Open-vocabulary Panoptic Segmentation with Embedding Modulation

ICCV 2023arXiv
0
citations

Shrinking Class Space for Enhanced Certainty in Semi-Supervised Learning

ICCV 2023arXiv
0
citations

BT^2: Backward-compatible Training with Basis Transformation

ICCV 2023
0
citations

MTFormer: Multi-task Learning via Transformer and Cross-Task Reasoning

ECCV 2022
0
citations

SegPGD: An Effective and Efficient Adversarial Attack for Evaluating and Boosting Segmentation Robustness

ECCV 2022
0
citations

DecoupleNet: Decoupled Network for Domain Adaptive Semantic Segmentation

ECCV 2022
0
citations

Mod-Squad: Designing Mixtures of Experts As Modular Multi-Task Learners

CVPR 2023
0
citations

SpatialCLIP: Learning 3D-aware Image Representations from Spatially Discriminative Language

CVPR 2025
0
citations

PanDA: Towards Panoramic Depth Anything with Unlabeled Panoramas and Mobius Spatial Augmentation

CVPR 2025
0
citations

DiffDoctor: Diagnosing Image Diffusion Models Before Treating

ICCV 2025
0
citations

StableDepth: Scene-Consistent and Scale-Invariant Monocular Depth

ICCV 2025
0
citations

DisCo: Towards Distinct and Coherent Visual Encapsulation in Video MLLMs

ICCV 2025
0
citations

HERMES: A Unified Self-Driving World Model for Simultaneous 3D Scene Understanding and Generation

ICCV 2025
0
citations

AnyDoor: Zero-shot Object-level Image Customization

CVPR 2024
0
citations

Visual Programming for Zero-shot Open-Vocabulary 3D Visual Grounding

CVPR 2024
0
citations

UniPAD: A Universal Pre-training Paradigm for Autonomous Driving

CVPR 2024
0
citations

OA-CNNs: Omni-Adaptive Sparse CNNs for 3D Semantic Segmentation

CVPR 2024
0
citations

Point Transformer V3: Simpler Faster Stronger

CVPR 2024
0
citations

UniMODE: Unified Monocular 3D Object Detection

CVPR 2024
0
citations

Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data

CVPR 2024
0
citations

Pyramid Scene Parsing Network

CVPR 2017arXiv
0
citations

PointWeb: Enhancing Local Neighborhood Features for Point Cloud Processing

CVPR 2019
0
citations

UPSNet: A Unified Panoptic Segmentation Network

CVPR 2019
0
citations

PointGroup: Dual-Set Point Grouping for 3D Instance Segmentation

CVPR 2020arXiv
0
citations

Do Different Tracking Tasks Require Different Appearance Models?

NeurIPS 2021arXiv
0
citations

Point Transformer V2: Grouped Vector Attention and Partition-based Pooling

NeurIPS 2022
0
citations

FreeMask: Synthetic Images with Dense Annotations Make Stronger Segmentation Models

NeurIPS 2023
0
citations

TMT-VIS: Taxonomy-aware Multi-dataset Joint Training for Video Instance Segmentation

NeurIPS 2023
0
citations

Uni3DETR: Unified 3D Detection Transformer

NeurIPS 2023
0
citations

CorresNeRF: Image Correspondence Priors for Neural Radiance Fields

NeurIPS 2023
0
citations