Zhaoxiang Zhang
75
Papers
218
Total Citations
Papers (75)
OmniBench: Towards The Future of Universal Omni-Language Models
NeurIPS 2025
51
citations
DrivingGPT: Unifying Driving World Modeling and Planning with Multi-modal Autoregressive Transformers
ICCV 2025
44
citations
FreeVS: Generative View Synthesis on Free Driving Trajectory
ICLR 2025
34
citations
Ross3D: Reconstructive Visual Instruction Tuning with 3D-Awareness
ICCV 2025
28
citations
DexVLG: Dexterous Vision-Language-Grasp Model at Scale
ICCV 2025
16
citations
Robust Depth Enhancement via Polarization Prompt Fusion Tuning
CVPR 2024
11
citations
MemoNav: Working Memory Model for Visual Navigation
CVPR 2024
10
citations
DriveDPO: Policy Learning via Safety DPO For End-to-End Autonomous Driving
NeurIPS 2025
6
citations
RCL: Reliable Continual Learning for Unified Failure Detection
CVPR 2024
6
citations
KORGym: A Dynamic Game Platform for LLM Reasoning Evaluation
NeurIPS 2025
4
citations
FIRM: Flexible Interactive Reflection ReMoval
AAAI 2025
3
citations
FlexDrive: Toward Trajectory Flexibility in Driving Scene Gaussian Splatting Reconstruction and Rendering
CVPR 2025
2
citations
Point-supervised Panoptic Segmentation via Estimating Pseudo Labels from Learnable Distance
ECCV 2024
2
citations
MCOP: Multi-UAV Collaborative Occupancy Prediction
ICCV 2025arXiv
1
citations
Learning Integral Objects With Intra-Class Discriminator for Weakly-Supervised Semantic Segmentation
CVPR 2020
0
citations
Context-Aware Attention Network for Image-Text Retrieval
CVPR 2020
0
citations
Instance Guided Proposal Network for Person Search
CVPR 2020
0
citations
Large-Scale Object Detection in the Wild From Imbalanced Multi-Labels
CVPR 2020arXiv
0
citations
Bottom-Up Human Pose Estimation via Disentangled Keypoint Regression
CVPR 2021arXiv
0
citations
Unsupervised Object Detection With LIDAR Clues
CVPR 2021arXiv
0
citations
Look Closer To Segment Better: Boundary Patch Refinement for Instance Segmentation
CVPR 2021arXiv
0
citations
RefineMask: Towards High-Quality Instance Segmentation With Fine-Grained Features
CVPR 2021arXiv
0
citations
GAIA: A Transfer Learning System of Object Detection That Fits Your Needs
CVPR 2021arXiv
0
citations
Distractor-Aware Fast Tracking via Dynamic Convolutions and MOT Philosophy
CVPR 2021arXiv
0
citations
Learnable Graph Matching: Incorporating Graph Partitioning With Deep Feature Learning for Multiple Object Tracking
CVPR 2021arXiv
0
citations
DATA: Domain-Aware and Task-Aware Self-Supervised Learning
CVPR 2022arXiv
0
citations
Sparse Instance Activation for Real-Time Instance Segmentation
CVPR 2022arXiv
0
citations
Embracing Single Stride 3D Object Detector With Sparse Transformer
CVPR 2022arXiv
0
citations
HP-Capsule: Unsupervised Face Part Discovery by Hierarchical Parsing Capsule Network
CVPR 2022
0
citations
Implicit Sample Extension for Unsupervised Person Re-Identification
CVPR 2022arXiv
0
citations
Remember the Difference: Cross-Domain Few-Shot Semantic Segmentation via Meta-Memory Transfer
CVPR 2022
0
citations
Continual Stereo Matching of Continuous Driving Scenes With Growing Architecture
CVPR 2022
0
citations
The Devil Is in the Details: Window-Based Attention for Image Compression
CVPR 2022arXiv
0
citations
Towards Noiseless Object Contours for Weakly Supervised Semantic Segmentation
CVPR 2022
0
citations
Graphics Capsule: Learning Hierarchical 3D Face Representations From 2D Images
CVPR 2023arXiv
0
citations
Intrinsic Physical Concepts Discovery With Object-Centric Predictive Models
CVPR 2023arXiv
0
citations
FrustumFormer: Adaptive Instance-Aware Resampling for Multi-View 3D Detection
CVPR 2023arXiv
0
citations
BEVFormer v2: Adapting Modern Image Backbones to Bird's-Eye-View Recognition via Perspective Supervision
CVPR 2023
0
citations
Hard Patches Mining for Masked Image Modeling
CVPR 2023arXiv
0
citations
Sharpness-Aware Gradient Matching for Domain Generalization
CVPR 2023arXiv
0
citations
3D Video Object Detection With Learnable Object-Centric Global Optimization
CVPR 2023arXiv
0
citations
BAEFormer: Bi-Directional and Early Interaction Transformers for Bird's Eye View Semantic Segmentation
CVPR 2023
0
citations
Blind Video Deflickering by Neural Filtering With a Flawed Atlas
CVPR 2023arXiv
0
citations
Spectral Feature Transformation for Person Re-Identification
ICCV 2019
0
citations
Improving Pedestrian Attribute Recognition With Weakly-Supervised Multi-Scale Attribute-Specific Localization
ICCV 2019
0
citations
Scale-Aware Trident Networks for Object Detection
ICCV 2019
0
citations
Sequence Level Semantics Aggregation for Video Object Detection
ICCV 2019
0
citations
POD: Practical Object Detection With Scale-Sensitive Network
ICCV 2019
0
citations
Once Detected, Never Lost: Surpassing Human Performance in Offline LiDAR based 3D Object Detection
ICCV 2023arXiv
0
citations
DDG-Net: Discriminability-Driven Graph Network for Weakly-supervised Temporal Action Localization
ICCV 2023
0
citations
FPR: False Positive Rectification for Weakly Supervised Semantic Segmentation
ICCV 2023
0
citations
LMR: A Large-Scale Multi-Reference Dataset for Reference-Based Super-Resolution
ICCV 2023arXiv
0
citations
Informative Data Mining for One-Shot Cross-Domain Semantic Segmentation
ICCV 2023arXiv
0
citations
SSF: Accelerating Training of Spiking Neural Networks with Stabilized Spiking Flow
ICCV 2023
0
citations
Generalizing Person Re-Identification by Camera-Aware Invariance Learning and Cross-Domain Mixup
ECCV 2020
0
citations
Boosting Decision-based Black-box Adversarial Attacks with Random Sign Flip
ECCV 2020
0
citations
Employing Multi-Estimations for Weakly-Supervised Semantic Segmentation
ECCV 2020
0
citations
Densely Constrained Depth Estimator for Monocular 3D Object Detection
ECCV 2022
0
citations
RRSR:Reciprocal Reference-Based Image Super-Resolution with Progressive Feature Alignment and Selection
ECCV 2022
0
citations
Stereo Depth Estimation with Echoes
ECCV 2022
0
citations
FreeSim: Toward Free-viewpoint Camera Simulation in Driving Scenes
CVPR 2025
0
citations
Pointly-Supervised Panoptic Segmentation
ECCV 2022
0
citations
End-to-End Driving with Online Trajectory Evaluation via BEV World Model
ICCV 2025
0
citations
UIPro: Unleashing Superior Interaction Capability For GUI Agents
ICCV 2025
0
citations
Images as Noisy Labels: Unleashing the Potential of the Diffusion Model for Open-Vocabulary Semantic Segmentation
ICCV 2025
0
citations
LayerAnimate: Layer-level Control for Animation
ICCV 2025
0
citations
SceneX: Procedural Controllable Large-Scale Scene Generation
AAAI 2025
0
citations
Fully Data-Driven Pseudo Label Estimation for Pointly-Supervised Panoptic Segmentation
AAAI 2024
0
citations
HardMo: A Large-Scale Hardcase Dataset for Motion Capture
CVPR 2024
0
citations
Continual Forgetting for Pre-trained Vision Models
CVPR 2024
0
citations
Driving into the Future: Multiview Visual Forecasting and Planning with World Model for Autonomous Driving
CVPR 2024
0
citations
Enhancing Visual Continual Learning with Language-Guided Supervision
CVPR 2024
0
citations
PanoOcc: Unified Occupancy Representation for Camera-based 3D Panoptic Segmentation
CVPR 2024
0
citations
GIFT: A Real-Time and Scalable 3D Shape Search Engine
CVPR 2016
0
citations
Bi-Directional Interaction Network for Person Search
CVPR 2020
0
citations