Zhaoxiang Zhang

75
Papers
218
Total Citations

Papers (75)

OmniBench: Towards The Future of Universal Omni-Language Models

NeurIPS 2025
51
citations

DrivingGPT: Unifying Driving World Modeling and Planning with Multi-modal Autoregressive Transformers

ICCV 2025
44
citations

FreeVS: Generative View Synthesis on Free Driving Trajectory

ICLR 2025
34
citations

Ross3D: Reconstructive Visual Instruction Tuning with 3D-Awareness

ICCV 2025
28
citations

DexVLG: Dexterous Vision-Language-Grasp Model at Scale

ICCV 2025
16
citations

Robust Depth Enhancement via Polarization Prompt Fusion Tuning

CVPR 2024
11
citations

MemoNav: Working Memory Model for Visual Navigation

CVPR 2024
10
citations

DriveDPO: Policy Learning via Safety DPO For End-to-End Autonomous Driving

NeurIPS 2025
6
citations

RCL: Reliable Continual Learning for Unified Failure Detection

CVPR 2024
6
citations

KORGym: A Dynamic Game Platform for LLM Reasoning Evaluation

NeurIPS 2025
4
citations

FIRM: Flexible Interactive Reflection ReMoval

AAAI 2025
3
citations

FlexDrive: Toward Trajectory Flexibility in Driving Scene Gaussian Splatting Reconstruction and Rendering

CVPR 2025
2
citations

Point-supervised Panoptic Segmentation via Estimating Pseudo Labels from Learnable Distance

ECCV 2024
2
citations

MCOP: Multi-UAV Collaborative Occupancy Prediction

ICCV 2025arXiv
1
citations

Learning Integral Objects With Intra-Class Discriminator for Weakly-Supervised Semantic Segmentation

CVPR 2020
0
citations

Context-Aware Attention Network for Image-Text Retrieval

CVPR 2020
0
citations

Instance Guided Proposal Network for Person Search

CVPR 2020
0
citations

Large-Scale Object Detection in the Wild From Imbalanced Multi-Labels

CVPR 2020arXiv
0
citations

Bottom-Up Human Pose Estimation via Disentangled Keypoint Regression

CVPR 2021arXiv
0
citations

Unsupervised Object Detection With LIDAR Clues

CVPR 2021arXiv
0
citations

Look Closer To Segment Better: Boundary Patch Refinement for Instance Segmentation

CVPR 2021arXiv
0
citations

RefineMask: Towards High-Quality Instance Segmentation With Fine-Grained Features

CVPR 2021arXiv
0
citations

GAIA: A Transfer Learning System of Object Detection That Fits Your Needs

CVPR 2021arXiv
0
citations

Distractor-Aware Fast Tracking via Dynamic Convolutions and MOT Philosophy

CVPR 2021arXiv
0
citations

Learnable Graph Matching: Incorporating Graph Partitioning With Deep Feature Learning for Multiple Object Tracking

CVPR 2021arXiv
0
citations

DATA: Domain-Aware and Task-Aware Self-Supervised Learning

CVPR 2022arXiv
0
citations

Sparse Instance Activation for Real-Time Instance Segmentation

CVPR 2022arXiv
0
citations

Embracing Single Stride 3D Object Detector With Sparse Transformer

CVPR 2022arXiv
0
citations

HP-Capsule: Unsupervised Face Part Discovery by Hierarchical Parsing Capsule Network

CVPR 2022
0
citations

Implicit Sample Extension for Unsupervised Person Re-Identification

CVPR 2022arXiv
0
citations

Remember the Difference: Cross-Domain Few-Shot Semantic Segmentation via Meta-Memory Transfer

CVPR 2022
0
citations

Continual Stereo Matching of Continuous Driving Scenes With Growing Architecture

CVPR 2022
0
citations

The Devil Is in the Details: Window-Based Attention for Image Compression

CVPR 2022arXiv
0
citations

Towards Noiseless Object Contours for Weakly Supervised Semantic Segmentation

CVPR 2022
0
citations

Graphics Capsule: Learning Hierarchical 3D Face Representations From 2D Images

CVPR 2023arXiv
0
citations

Intrinsic Physical Concepts Discovery With Object-Centric Predictive Models

CVPR 2023arXiv
0
citations

FrustumFormer: Adaptive Instance-Aware Resampling for Multi-View 3D Detection

CVPR 2023arXiv
0
citations

BEVFormer v2: Adapting Modern Image Backbones to Bird's-Eye-View Recognition via Perspective Supervision

CVPR 2023
0
citations

Hard Patches Mining for Masked Image Modeling

CVPR 2023arXiv
0
citations

Sharpness-Aware Gradient Matching for Domain Generalization

CVPR 2023arXiv
0
citations

3D Video Object Detection With Learnable Object-Centric Global Optimization

CVPR 2023arXiv
0
citations

BAEFormer: Bi-Directional and Early Interaction Transformers for Bird's Eye View Semantic Segmentation

CVPR 2023
0
citations

Blind Video Deflickering by Neural Filtering With a Flawed Atlas

CVPR 2023arXiv
0
citations

Spectral Feature Transformation for Person Re-Identification

ICCV 2019
0
citations

Improving Pedestrian Attribute Recognition With Weakly-Supervised Multi-Scale Attribute-Specific Localization

ICCV 2019
0
citations

Scale-Aware Trident Networks for Object Detection

ICCV 2019
0
citations

Sequence Level Semantics Aggregation for Video Object Detection

ICCV 2019
0
citations

POD: Practical Object Detection With Scale-Sensitive Network

ICCV 2019
0
citations

Once Detected, Never Lost: Surpassing Human Performance in Offline LiDAR based 3D Object Detection

ICCV 2023arXiv
0
citations

DDG-Net: Discriminability-Driven Graph Network for Weakly-supervised Temporal Action Localization

ICCV 2023
0
citations

FPR: False Positive Rectification for Weakly Supervised Semantic Segmentation

ICCV 2023
0
citations

LMR: A Large-Scale Multi-Reference Dataset for Reference-Based Super-Resolution

ICCV 2023arXiv
0
citations

Informative Data Mining for One-Shot Cross-Domain Semantic Segmentation

ICCV 2023arXiv
0
citations

SSF: Accelerating Training of Spiking Neural Networks with Stabilized Spiking Flow

ICCV 2023
0
citations

Generalizing Person Re-Identification by Camera-Aware Invariance Learning and Cross-Domain Mixup

ECCV 2020
0
citations

Boosting Decision-based Black-box Adversarial Attacks with Random Sign Flip

ECCV 2020
0
citations

Employing Multi-Estimations for Weakly-Supervised Semantic Segmentation

ECCV 2020
0
citations

Densely Constrained Depth Estimator for Monocular 3D Object Detection

ECCV 2022
0
citations

RRSR:Reciprocal Reference-Based Image Super-Resolution with Progressive Feature Alignment and Selection

ECCV 2022
0
citations

Stereo Depth Estimation with Echoes

ECCV 2022
0
citations

FreeSim: Toward Free-viewpoint Camera Simulation in Driving Scenes

CVPR 2025
0
citations

Pointly-Supervised Panoptic Segmentation

ECCV 2022
0
citations

End-to-End Driving with Online Trajectory Evaluation via BEV World Model

ICCV 2025
0
citations

UIPro: Unleashing Superior Interaction Capability For GUI Agents

ICCV 2025
0
citations

Images as Noisy Labels: Unleashing the Potential of the Diffusion Model for Open-Vocabulary Semantic Segmentation

ICCV 2025
0
citations

LayerAnimate: Layer-level Control for Animation

ICCV 2025
0
citations

SceneX: Procedural Controllable Large-Scale Scene Generation

AAAI 2025
0
citations

Fully Data-Driven Pseudo Label Estimation for Pointly-Supervised Panoptic Segmentation

AAAI 2024
0
citations

HardMo: A Large-Scale Hardcase Dataset for Motion Capture

CVPR 2024
0
citations

Continual Forgetting for Pre-trained Vision Models

CVPR 2024
0
citations

Driving into the Future: Multiview Visual Forecasting and Planning with World Model for Autonomous Driving

CVPR 2024
0
citations

Enhancing Visual Continual Learning with Language-Guided Supervision

CVPR 2024
0
citations

PanoOcc: Unified Occupancy Representation for Camera-based 3D Panoptic Segmentation

CVPR 2024
0
citations

GIFT: A Real-Time and Scalable 3D Shape Search Engine

CVPR 2016
0
citations

Bi-Directional Interaction Network for Person Search

CVPR 2020
0
citations