Hengshuang Zhao

65

Papers

527

Total Citations

Papers (65)

Pixel-GS Density Control with Pixel-aware Gradient for 3D Gaussian Splatting

Towards Large-scale 3D Representation Learning with Multi-dataset Point Prompt Training

UniReal: Universal Image Generation and Editing via Learning Real-world Dynamics

GPT4Point: A Unified Framework for Point-Language Understanding and Generation

EMOVA: Empowering Language Models to See, Hear and Speak with Vivid Emotions

Sonata: Self-Supervised Learning of Reliable Point Representations

Orient Anything: Learning Robust Object Orientation Estimation from Rendering 3D Models

GroupContrast: Semantic-aware Self-supervised Representation Learning for 3D Understanding

DreamComposer: Controllable 3D Object Generation via Multi-View Conditions

DriveGPT4-V2: Harnessing Large Language Model Capabilities for Enhanced Closed-Loop Autonomous Driving

ViLLa: Video Reasoning Segmentation with Large Language Model

HiRes-LLaVA: Restoring Fragmentation Input in High-Resolution Large Vision-Language Models

HaploVL: A Single-Transformer Baseline for Multi-Modal Understanding

ROSE: Remove Objects with Side Effects in Videos

LiteReality: Graphic-Ready 3D Scene Reconstruction from RGB-D Scans

Empowering Large Language Models with 3D Situation Awareness

PlayerOne: Egocentric World Simulator

BOOD: Boundary-based Out-Of-Distribution Data Generation

Exploring Self-Attention for Image Recognition

Distilling Knowledge via Knowledge Review

Semi-Supervised Semantic Segmentation With Directional Context-Aware Consistency

PAConv: Position Adaptive Convolution With Dynamic Kernel Assembling on Point Clouds

Fully Convolutional Networks for Panoptic Segmentation

Bidirectional Projection Network for Cross Dimension Scene Understanding

Rethinking Semantic Segmentation From a Sequence-to-Sequence Perspective With Transformers

FocalClick: Towards Practical Interactive Image Segmentation

Generalized Few-Shot Semantic Segmentation

PhysFormer: Facial Video-Based Physiological Measurement With Temporal Difference Transformer

LAVT: Language-Aware Vision Transformer for Referring Image Segmentation

Stratified Transformer for 3D Point Cloud Segmentation

Masked Scene Contrast: A Scalable Framework for Unsupervised 3D Representation Learning

Detecting Everything in the Open World: Towards Universal Object Detection

Hierarchical Point-Edge Interaction Network for Point Cloud Semantic Segmentation

Dynamic Divide-and-Conquer Adversarial Training for Robust Semantic Segmentation

Point Transformer

Open-vocabulary Panoptic Segmentation with Embedding Modulation

Shrinking Class Space for Enhanced Certainty in Semi-Supervised Learning

BT^2: Backward-compatible Training with Basis Transformation

MTFormer: Multi-task Learning via Transformer and Cross-Task Reasoning

SegPGD: An Effective and Efficient Adversarial Attack for Evaluating and Boosting Segmentation Robustness

DecoupleNet: Decoupled Network for Domain Adaptive Semantic Segmentation

Mod-Squad: Designing Mixtures of Experts As Modular Multi-Task Learners

SpatialCLIP: Learning 3D-aware Image Representations from Spatially Discriminative Language

PanDA: Towards Panoramic Depth Anything with Unlabeled Panoramas and Mobius Spatial Augmentation

DiffDoctor: Diagnosing Image Diffusion Models Before Treating

StableDepth: Scene-Consistent and Scale-Invariant Monocular Depth

DisCo: Towards Distinct and Coherent Visual Encapsulation in Video MLLMs

HERMES: A Unified Self-Driving World Model for Simultaneous 3D Scene Understanding and Generation

AnyDoor: Zero-shot Object-level Image Customization

Visual Programming for Zero-shot Open-Vocabulary 3D Visual Grounding

UniPAD: A Universal Pre-training Paradigm for Autonomous Driving

OA-CNNs: Omni-Adaptive Sparse CNNs for 3D Semantic Segmentation

Point Transformer V3: Simpler Faster Stronger

UniMODE: Unified Monocular 3D Object Detection

Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data

Pyramid Scene Parsing Network

PointWeb: Enhancing Local Neighborhood Features for Point Cloud Processing

UPSNet: A Unified Panoptic Segmentation Network

PointGroup: Dual-Set Point Grouping for 3D Instance Segmentation

Do Different Tracking Tasks Require Different Appearance Models?

NeurIPS 2021arXiv

Point Transformer V2: Grouped Vector Attention and Partition-based Pooling

FreeMask: Synthetic Images with Dense Annotations Make Stronger Segmentation Models

TMT-VIS: Taxonomy-aware Multi-dataset Joint Training for Video Instance Segmentation

Uni3DETR: Unified 3D Detection Transformer

CorresNeRF: Image Correspondence Priors for Neural Radiance Fields