Hang Zhao

39
Papers
68
Total Citations

Papers (39)

SEPT: Towards Efficient Scene Representation Learning for Motion Prediction

ICLR 2024
45
citations

LONG3R: Long Sequence Streaming 3D Reconstruction

ICCV 2025
14
citations

Janus-Pro-R1: Advancing Collaborative Visual Comprehension and Generation via Reinforcement Learning

NeurIPS 2025
6
citations

GS-Occ3D: Scaling Vision-only Occupancy Reconstruction with Gaussian Splatting

ICCV 2025
3
citations

Scalability in Perception for Autonomous Driving: Waymo Open Dataset

CVPR 2020arXiv
0
citations

Music Gesture for Visual Sound Separation

CVPR 2020arXiv
0
citations

VectorNet: Encoding HD Maps and Agent Dynamics From Vectorized Representation

CVPR 2020arXiv
0
citations

HDMapGen: A Hierarchical Graph Generative Model of High Definition Maps

CVPR 2021
0
citations

Embracing Single Stride 3D Object Detector With Sparse Transformer

CVPR 2022arXiv
0
citations

Egocentric Prediction of Action Target in 3D

CVPR 2022arXiv
0
citations

Co-Advise: Cross Inductive Bias Distillation

CVPR 2022
0
citations

M2I: From Factored Marginal Trajectory Prediction to Interactive Prediction

CVPR 2022arXiv
0
citations

GeoMAE: Masked Geometric Target Prediction for Self-Supervised Point Cloud Pre-Training

CVPR 2023arXiv
0
citations

Neural Map Prior for Autonomous Driving

CVPR 2023arXiv
0
citations

What Happened 3 Seconds Ago? Inferring the Past With Thermal Imaging

CVPR 2023arXiv
0
citations

SparseViT: Revisiting Activation Sparsity for Efficient High-Resolution Vision Transformer

CVPR 2023arXiv
0
citations

ViP3D: End-to-End Visual Trajectory Prediction via 3D Agent Queries

CVPR 2023arXiv
0
citations

Open Vocabulary Scene Parsing

ICCV 2017arXiv
0
citations

The Sound of Motions

ICCV 2019
0
citations

Self-Supervised Moving Vehicle Tracking With Stereo Sound

ICCV 2019
0
citations

HACS: Human Action Clips and Segments Dataset for Recognition and Temporal Localization

ICCV 2019
0
citations

Through-Wall Human Mesh Recovery Using Radio Signals

ICCV 2019
0
citations

On Feature Decorrelation in Self-Supervised Learning

ICCV 2021arXiv
0
citations

DenseTNT: End-to-End Trajectory Prediction From Dense Goal Sets

ICCV 2021arXiv
0
citations

Large Scale Interactive Motion Forecasting for Autonomous Driving: The Waymo Open Motion Dataset

ICCV 2021arXiv
0
citations

Multimodal Knowledge Expansion

ICCV 2021arXiv
0
citations

PVT++: A Simple End-to-End Latency-Aware Visual Tracking Framework

ICCV 2023
0
citations

INT2: Interactive Trajectory Prediction at Intersections

ICCV 2023
0
citations

CYBORGS: Contrastively Bootstrapping Object Representations by Grounding in Segmentation

ECCV 2022
0
citations

Learning Visual Styles from Audio-Visual Associations

ECCV 2022
0
citations

Supervising Sound Localization by In-the-wild Egomotion

CVPR 2025
0
citations

Deep Demonstration Tracing: Learning Generalizable Imitator Policy for Runtime Imitation from a Single Demonstration

ICML 2024
0
citations

Scene Parsing Through ADE20K Dataset

CVPR 2017
0
citations

Through-Wall Human Pose Estimation Using Radio Signals

CVPR 2018
0
citations

UnModNet: Learning to Unwrap a Modulo Image for High Dynamic Range Imaging

NeurIPS 2020
0
citations

What Makes Multi-Modal Learning Better than Single (Provably)

NeurIPS 2021
0
citations

Neural Dubber: Dubbing for Videos According to Scripts

NeurIPS 2021
0
citations

Diff-Foley: Synchronized Video-to-Audio Synthesis with Latent Diffusion Models

NeurIPS 2023
0
citations

Occ3D: A Large-Scale 3D Occupancy Prediction Benchmark for Autonomous Driving

NeurIPS 2023
0
citations