Hang Zhao
39
Papers
68
Total Citations
Papers (39)
SEPT: Towards Efficient Scene Representation Learning for Motion Prediction
ICLR 2024
45
citations
LONG3R: Long Sequence Streaming 3D Reconstruction
ICCV 2025
14
citations
Janus-Pro-R1: Advancing Collaborative Visual Comprehension and Generation via Reinforcement Learning
NeurIPS 2025
6
citations
GS-Occ3D: Scaling Vision-only Occupancy Reconstruction with Gaussian Splatting
ICCV 2025
3
citations
Scalability in Perception for Autonomous Driving: Waymo Open Dataset
CVPR 2020arXiv
0
citations
Music Gesture for Visual Sound Separation
CVPR 2020arXiv
0
citations
VectorNet: Encoding HD Maps and Agent Dynamics From Vectorized Representation
CVPR 2020arXiv
0
citations
HDMapGen: A Hierarchical Graph Generative Model of High Definition Maps
CVPR 2021
0
citations
Embracing Single Stride 3D Object Detector With Sparse Transformer
CVPR 2022arXiv
0
citations
Egocentric Prediction of Action Target in 3D
CVPR 2022arXiv
0
citations
Co-Advise: Cross Inductive Bias Distillation
CVPR 2022
0
citations
M2I: From Factored Marginal Trajectory Prediction to Interactive Prediction
CVPR 2022arXiv
0
citations
GeoMAE: Masked Geometric Target Prediction for Self-Supervised Point Cloud Pre-Training
CVPR 2023arXiv
0
citations
Neural Map Prior for Autonomous Driving
CVPR 2023arXiv
0
citations
What Happened 3 Seconds Ago? Inferring the Past With Thermal Imaging
CVPR 2023arXiv
0
citations
SparseViT: Revisiting Activation Sparsity for Efficient High-Resolution Vision Transformer
CVPR 2023arXiv
0
citations
ViP3D: End-to-End Visual Trajectory Prediction via 3D Agent Queries
CVPR 2023arXiv
0
citations
Open Vocabulary Scene Parsing
ICCV 2017arXiv
0
citations
The Sound of Motions
ICCV 2019
0
citations
Self-Supervised Moving Vehicle Tracking With Stereo Sound
ICCV 2019
0
citations
HACS: Human Action Clips and Segments Dataset for Recognition and Temporal Localization
ICCV 2019
0
citations
Through-Wall Human Mesh Recovery Using Radio Signals
ICCV 2019
0
citations
On Feature Decorrelation in Self-Supervised Learning
ICCV 2021arXiv
0
citations
DenseTNT: End-to-End Trajectory Prediction From Dense Goal Sets
ICCV 2021arXiv
0
citations
Large Scale Interactive Motion Forecasting for Autonomous Driving: The Waymo Open Motion Dataset
ICCV 2021arXiv
0
citations
Multimodal Knowledge Expansion
ICCV 2021arXiv
0
citations
PVT++: A Simple End-to-End Latency-Aware Visual Tracking Framework
ICCV 2023
0
citations
INT2: Interactive Trajectory Prediction at Intersections
ICCV 2023
0
citations
CYBORGS: Contrastively Bootstrapping Object Representations by Grounding in Segmentation
ECCV 2022
0
citations
Learning Visual Styles from Audio-Visual Associations
ECCV 2022
0
citations
Supervising Sound Localization by In-the-wild Egomotion
CVPR 2025
0
citations
Deep Demonstration Tracing: Learning Generalizable Imitator Policy for Runtime Imitation from a Single Demonstration
ICML 2024
0
citations
Scene Parsing Through ADE20K Dataset
CVPR 2017
0
citations
Through-Wall Human Pose Estimation Using Radio Signals
CVPR 2018
0
citations
UnModNet: Learning to Unwrap a Modulo Image for High Dynamic Range Imaging
NeurIPS 2020
0
citations
What Makes Multi-Modal Learning Better than Single (Provably)
NeurIPS 2021
0
citations
Neural Dubber: Dubbing for Videos According to Scripts
NeurIPS 2021
0
citations
Diff-Foley: Synchronized Video-to-Audio Synthesis with Latent Diffusion Models
NeurIPS 2023
0
citations
Occ3D: A Large-Scale 3D Occupancy Prediction Benchmark for Autonomous Driving
NeurIPS 2023
0
citations