Zheng Zhu

40

Papers

229

Total Citations

Papers (40)

DriveDreamer4D: World Models Are Effective Data Machines for 4D Driving Scene Representation

ReconDreamer: Crafting World Models for Driving Scene Reconstruction via Online Restoration

DiffBEV: Conditional Diffusion Model for Bird’s Eye View Perception

EgoVid-5M: A Large-Scale Video-Action Dataset for Egocentric Videos Generation

NeurIPS 2025arXiv

ReconDreamer++: Harmonizing Generative and Reconstructive Models for Driving Scene Representation

One at a Time: Progressive Multi-Step Volumetric Probability Learning for Reliable 3D Scene Perception

End-to-End Flow Correlation Tracking With Spatial-Temporal Attention

High Performance Visual Tracking With Siamese Region Proposal Network

Attention-Guided Unified Network for Panoptic Segmentation

The Devil Is in the Details: Delving Into Unbiased Data Processing for Human Pose Estimation

Structure-Aware Face Clustering on a Large-Scale Graph With 107 Nodes

WebFace260M: A Benchmark Unveiling the Power of Million-Scale Deep Face Recognition

Decoupled Multi-Task Learning With Cyclical Self-Regulation for Face Parsing

CAFE: Learning To Condense Dataset by Aligning Features

Dimension Embeddings for Monocular 3D Object Detection

Crafting Better Contrastive Views for Siamese Representation Learning

DenseCLIP: Language-Guided Dense Prediction With Context-Aware Prompting

An Efficient Training Approach for Very Large Scale Face Recognition

Shapley-NAS: Discovering Operation Contribution for Neural Architecture Search

Are We Ready for Vision-Centric Driving Streaming Perception? The ASAP Benchmark

CompletionFormer: Depth Completion With Convolutions and Vision Transformers

DiffTalk: Crafting Diffusion Models for Generalized Audio-Driven Portraits Animation

Gait Recognition in the Wild: A Benchmark

OPERA: Omni-Supervised Representation Learning with Hierarchical Supervisions

OccFormer: Dual-path Transformer for Vision-based 3D Semantic Occupancy Prediction

Token-Label Alignment for Vision Transformers

DREAM: Efficient Dataset Distillation by Representative Matching

DyGait: Exploiting Dynamic Representations for High-performance Gait Recognition

OpenOccupancy: A Large Scale Benchmark for Surrounding Semantic Occupancy Perception

SurroundOcc: Multi-camera 3D Occupancy Prediction for Autonomous Driving

Learning Dynamic Facial Radiance Fields for Few-Shot Talking Head Synthesis

MVSTER: Epipolar Transformer for Efficient Multi-View Stereo

HumanDreamer: Generating Controllable Human-Motion Videos via Decoupled Generation

JTD-UAV: MLLM-Enhanced Joint Tracking and Description Framework for Anti-UAV Systems

WonderTurbo: Generating Interactive 3D World in 0.72 Seconds

DetRF: Detachable Novel Views Synthesis of Dynamic Scenes Using Backdrop-Driven Neural Radiance Fields

DriveDreamer-2: LLM-Enhanced World Models for Diverse Driving Video Generation

DriveWorld: 4D Pre-trained Scene Understanding via World Models for Autonomous Driving

Global Filter Networks for Image Classification

OrdinalCLIP: Learning Rank Prompts for Language-Guided Ordinal Regression