Wei Zhang

130
Papers
1,750
Total Citations

Papers (130)

Can Decentralized Algorithms Outperform Centralized Algorithms? A Case Study for Decentralized Parallel Stochastic Gradient Descent

NeurIPS 2017arXiv
1,364
citations

Referred by Multi-Modality: A Unified Temporal Transformer for Video Object Segmentation

AAAI 2024arXiv
58
citations

DetCLIPv3: Towards Versatile Generative Open-vocabulary Object Detection

CVPR 2024
45
citations

EMOVA: Empowering Language Models to See, Hear and Speak with Vivid Emotions

CVPR 2025
44
citations

ILLUME: Illuminating Your LLMs to See, Draw, and Self-Enhance

ICCV 2025
43
citations

Latent Space Editing in Transformer-Based Flow Matching

AAAI 2024arXiv
38
citations

Towards a Simultaneous and Granular Identity-Expression Control in Personalized Face Generation

CVPR 2024
32
citations

Language-Driven Anchors for Zero-Shot Adversarial Robustness

CVPR 2024
21
citations

Decoupled Pseudo-labeling for Semi-Supervised Monocular 3D Object Detection

CVPR 2024
19
citations

Object Detection using Event Camera: A MoE Heat Conduction based Detector and A New Benchmark Dataset

CVPR 2025
18
citations

HiRes-LLaVA: Restoring Fragmentation Input in High-Resolution Large Vision-Language Models

CVPR 2025
13
citations

Gaussian Process Neural Additive Models

AAAI 2024arXiv
11
citations

LaneGraph2Seq: Lane Topology Extraction with Language Model via Vertex-Edge Encoding and Connectivity Enhancement

AAAI 2024arXiv
10
citations

GeoReF: Geometric Alignment Across Shape Variation for Category-level Object Pose Refinement

CVPR 2024
8
citations

MMReason: An Open-Ended Multi-Modal Multi-Step Reasoning Benchmark for MLLMs Toward AGI

ICCV 2025
7
citations

KORGym: A Dynamic Game Platform for LLM Reasoning Evaluation

NeurIPS 2025
4
citations

As Simple as Fine-tuning: LLM Alignment via Bidirectional Negative Feedback Loss

ICLR 2025
3
citations

Less Attention is More: Prompt Transformer for Generalized Category Discovery

CVPR 2025
3
citations

PBCAT: Patch-Based Composite Adversarial Training against Physically Realizable Attacks on Object Detection

ICCV 2025
3
citations

EasyCraft: A Robust and Efficient Framework for Automatic Avatar Crafting

CVPR 2025
2
citations

Context Guided Transformer Entropy Modeling for Video Compression

ICCV 2025
1
citations

Learning Implicit Features with Flow-Infused Transformations for Realistic Virtual Try-On

ICCV 2025
1
citations

SleepSMC: Ubiquitous Sleep Staging via Supervised Multimodal Coordination

ICLR 2025
1
citations

Guiding Cross-Modal Representations with MLLM Priors via Preference Alignment

NeurIPS 2025
1
citations

Binarized Mode Seeking for Scalable Visual Pattern Discovery

CVPR 2017
0
citations

Optical Flow Guided Feature: A Fast and Robust Motion Representation for Video Action Recognition

CVPR 2018arXiv
0
citations

Reconstruction Network for Video Captioning

CVPR 2018arXiv
0
citations

Unsupervised Person Image Generation With Semantic Parsing Transformation

CVPR 2019
0
citations

Destruction and Construction Learning for Fine-Grained Image Recognition

CVPR 2019
0
citations

Embedding Complementary Deep Networks for Image Classification

CVPR 2019
0
citations

Look-Into-Object: Self-Supervised Structure Modeling for Object Recognition

CVPR 2020
0
citations

SP-NAS: Serial-to-Parallel Backbone Search for Object Detection

CVPR 2020
0
citations

Adaptive Fractional Dilated Convolution Network for Image Aesthetics Assessment

CVPR 2020arXiv
0
citations

Points As Queries: Weakly Semi-Supervised Object Detection by Points

CVPR 2021arXiv
0
citations

Source-Free Domain Adaptation for Semantic Segmentation

CVPR 2021arXiv
0
citations

Mesh Saliency: An Independent Perceptual Measure or a Derivative of Image Saliency?

CVPR 2021
0
citations

Focus on Local: Detecting Lane Marker From Bottom Up via Key Point

CVPR 2021arXiv
0
citations

Zero-Shot Adversarial Quantization

CVPR 2021arXiv
0
citations

HourNAS: Extremely Fast Neural Architecture Search Through an Hourglass Lens

CVPR 2021arXiv
0
citations

UAV-Human: A Large Benchmark for Human Behavior Understanding With Unmanned Aerial Vehicles

CVPR 2021
0
citations

Discrimination-Aware Mechanism for Fine-Grained Representation Learning

CVPR 2021
0
citations

LPSNet: A Lightweight Solution for Fast Panoptic Segmentation

CVPR 2021
0
citations

Learning a Facial Expression Embedding Disentangled From Identity

CVPR 2021
0
citations

Joint-DetNAS: Upgrade Your Detector With NAS, Pruning and Dynamic Distillation

CVPR 2021
0
citations

Point2Seq: Detecting 3D Objects As Sequences

CVPR 2022arXiv
0
citations

ISDNet: Integrating Shallow and Deep Networks for Efficient Ultra-High Resolution Segmentation

CVPR 2022
0
citations

Directional Self-Supervised Learning for Heavy Image Augmentations

CVPR 2022arXiv
0
citations

A Large-Scale Comprehensive Dataset and Copy-Overlap Aware Evaluation Protocol for Segment-Level Video Copy Detection

CVPR 2022arXiv
0
citations

FERV39k: A Large-Scale Multi-Scene Dataset for Facial Expression Recognition in Videos

CVPR 2022arXiv
0
citations

Class-Aware Contrastive Semi-Supervised Learning

CVPR 2022arXiv
0
citations

PointCLIP: Point Cloud Understanding by CLIP

CVPR 2022arXiv
0
citations

Ambiguity-Resistant Semi-Supervised Learning for Dense Object Detection

CVPR 2023arXiv
0
citations

DetCLIPv2: Scalable Open-Vocabulary Object Detection Pre-Training via Word-Region Alignment

CVPR 2023arXiv
0
citations

Semi-DETR: Semi-Supervised Object Detection With Detection Transformers

CVPR 2023
0
citations

CapDet: Unifying Dense Captioning and Open-World Detection Pretraining

CVPR 2023arXiv
0
citations

HS-Pose: Hybrid Scope Feature Extraction for Category-Level Object Pose Estimation

CVPR 2023
0
citations

Multiple Granularity Descriptors for Fine-Grained Categorization

ICCV 2015
0
citations

A Spatio-Temporal Appearance Representation for Viceo-Based Pedestrian Re-Identification

ICCV 2015
0
citations

Controllable Video Captioning With POS Sequence Guidance Based on Gated Fusion Network

ICCV 2019
0
citations

Auto-FPN: Automatic Network Architecture Adaptation for Object Detection Beyond Classification

ICCV 2019
0
citations

Sampling Wisely: Deep Image Embedding by Top-K Precision Optimization

ICCV 2019
0
citations

VrR-VG: Refocusing Visually-Relevant Relationships

ICCV 2019
0
citations

G-DetKD: Towards General Distillation Framework for Object Detectors via Contrastive and Semantic-Guided Feature Imitation

ICCV 2021
0
citations

Box-Aware Feature Enhancement for Single Object Tracking on Point Clouds

ICCV 2021arXiv
0
citations

Exploring Geometry-Aware Contrast and Clustering Harmonization for Self-Supervised 3D Object Detection

ICCV 2021
0
citations

C3-SemiSeg: Contrastive Semi-Supervised Segmentation via Cross-Set Learning and Dynamic Class-Balancing

ICCV 2021
0
citations

E2E-LOAD: End-to-End Long-form Online Action Detection

ICCV 2023
0
citations

WaterMask: Instance Segmentation for Underwater Imagery

ICCV 2023
0
citations

Data-free Knowledge Distillation for Fine-grained Visual Categorization

ICCV 2023
0
citations

Gradient-based Sampling for Class Imbalanced Semi-supervised Object Detection

ICCV 2023
0
citations

Translating Images to Road Network: A Non-Autoregressive Sequence-to-Sequence Approach

ICCV 2023
0
citations

LVOS: A Benchmark for Long-term Video Object Segmentation

ICCV 2023arXiv
0
citations

CFCG: Semi-Supervised Semantic Segmentation via Cross-Fusion and Contour Guidance Supervision

ICCV 2023
0
citations

DiffDis: Empowering Generative Diffusion Model with Cross-Modal Discrimination Capability

ICCV 2023arXiv
0
citations

GrowCLIP: Data-Aware Automatic Model Growing for Large-scale Contrastive Language-Image Pre-Training

ICCV 2023arXiv
0
citations

MRN: Multiplexed Routing Network for Incremental Multilingual Text Recognition

ICCV 2023arXiv
0
citations

Towards High-Fidelity Text-Guided 3D Face Generation and Manipulation Using only Images

ICCV 2023arXiv
0
citations

Segment as Points for Efficient Online Multi-Object Tracking and Segmentation

ECCV 2020
0
citations

HARD-Net: Hardness-AwaRe Discrimination Network for 3D Early Activity Prediction

ECCV 2020
0
citations

Classes Matter: A Fine-grained Adversarial Approach to Cross-domain Semantic Segmentation

ECCV 2020
0
citations

CurveLane-NAS: Unifying Lane-Sensitive Architecture Search and Adaptive Point Blending

ECCV 2020
0
citations

GeoLayout: Geometry Driven Room Layout Estimation Based on Depth Maps of Planes

ECCV 2020
0
citations

Unsupervised Multi-View CNN for Salient View Selection of 3D Objects and Scenes

ECCV 2020
0
citations

Bayesian Optimization with Clustering and Rollback for CNN Auto Pruning

ECCV 2022
0
citations

Diverse Learner: Exploring Diverse Supervision for Semi-Supervised Object Detection

ECCV 2022
0
citations

Tip-Adapter: Training-Free Adaption of CLIP for Few-Shot Classification

ECCV 2022
0
citations

Responsive Listening Head Generation: A Benchmark Dataset and Baseline

ECCV 2022
0
citations

CODA: A Real-World Road Corner Case Dataset for Object Detection in Autonomous Driving

ECCV 2022
0
citations

SFOD: Spiking Fusion Object Detector

CVPR 2024
0
citations

Decoupled Motion Expression Video Segmentation

CVPR 2025
0
citations

GROVE: A Generalized Reward for Learning Open-Vocabulary Physical Skill

CVPR 2025
0
citations

SimpleVQA: Multimodal Factuality Evaluation for Multimodal Large Language Models

ICCV 2025
0
citations

AdaDrive: Self-Adaptive Slow-Fast System for Language-Grounded Autonomous Driving

ICCV 2025
0
citations

VLDrive: Vision-Augmented Lightweight MLLMs for Efficient Language-grounded Autonomous Driving

ICCV 2025
0
citations

LaneDiffusion: Improving Centerline Graph Learning via Prior Injected BEV Feature Generation

ICCV 2025
0
citations

General Compression Framework for Efficient Transformer Object Tracking

ICCV 2025
0
citations

Efficient Event Camera Data Pretraining with Adaptive Prompt Fusion

ICCV 2025
0
citations

PerReactor: Offline Personalised Multiple Appropriate Facial Reaction Generation

AAAI 2025
0
citations

In2NeCT: Inter-class and Intra-class Neural Collapse Tuning for Semantic Segmentation of Imbalanced Remote Sensing Images

AAAI 2025
0
citations

Coherency Improved Explainable Recommendation via Large Language Model

AAAI 2025
0
citations

STAIR: Manipulating Collaborative and Multimodal Information for E-Commerce Recommendation

AAAI 2025
0
citations

Symbolic Cognitive Diagnosis via Hybrid Optimization for Intelligent Education Systems

AAAI 2024
0
citations

CGMGM: A Cross-Gaussian Mixture Generative Model for Few-Shot Semantic Segmentation

AAAI 2024
0
citations

EVS-assisted Joint Deblurring Rolling-Shutter Correction and Video Frame Interpolation through Sensor Inverse Modeling

CVPR 2024
0
citations

BIVDiff: A Training-Free Framework for General-Purpose Video Synthesis via Bridging Image and Video Diffusion Models

CVPR 2024
0
citations

Enhanced Motion-Text Alignment for Image-to-Video Transfer Learning

CVPR 2024
0
citations

Event-based Visible and Infrared Fusion via Multi-task Collaboration

CVPR 2024
0
citations

Holistic Autonomous Driving Understanding by Bird’s-Eye-View Injected Multi-Modal Large Models

CVPR 2024
0
citations

HetSSNet: Spatial-Spectral Heterogeneous Graph Learning Network for Panchromatic and Multispectral Images Fusion

ICML 2025
0
citations

ESNet: Evolution and Succession Network for High-Resolution Salient Object Detection

ICML 2024
0
citations

Interpreting and Improving Large Language Models in Arithmetic Calculation

ICML 2024
0
citations

Weakly Supervised Semantic Segmentation for Social Images

CVPR 2015
0
citations

Evolutionary Stochastic Gradient Descent for Optimization of Deep Neural Networks

NeurIPS 2018
0
citations

Hybrid 8-bit Floating Point (HFP8) Training and Inference for Deep Neural Networks

NeurIPS 2019
0
citations

Residual Distillation: Towards Portable Deep Neural Networks without Shortcuts

NeurIPS 2020
0
citations

A Decentralized Parallel Algorithm for Training Generative Adversarial Nets

NeurIPS 2020
0
citations

Online Decision Based Visual Tracking via Reinforcement Learning

NeurIPS 2020
0
citations

Kernel Based Progressive Distillation for Adder Neural Networks

NeurIPS 2020
0
citations

ScaleCom: Scalable Sparsified Gradient Compression for Communication-Efficient Distributed Training

NeurIPS 2020
0
citations

Finite-Time Analysis for Double Q-learning

NeurIPS 2020
0
citations

Model Rubik’s Cube: Twisting Resolution, Depth and Width for TinyNets

NeurIPS 2020
0
citations

Post-Training Quantization for Vision Transformer

NeurIPS 2021
0
citations

Scalable Rule-Based Representation Learning for Interpretable Classification

NeurIPS 2021
0
citations

DetCLIP: Dictionary-Enriched Visual-Concept Paralleled Pre-training for Open-world Detection

NeurIPS 2022
0
citations

Robustness to Unbounded Smoothness of Generalized SignSGD

NeurIPS 2022
0
citations

Leveraging the Hints: Adaptive Bidding in Repeated First-Price Auctions

NeurIPS 2022
0
citations

Wukong: A 100 Million Large-scale Chinese Cross-modal Pre-training Benchmark

NeurIPS 2022
0
citations

Reading Relevant Feature from Global Representation Memory for Visual Object Tracking

NeurIPS 2023
0
citations

OpenLane-V2: A Topology Reasoning Benchmark for Unified 3D HD Mapping

NeurIPS 2023
0
citations

Asynchronous Decentralized Parallel Stochastic Gradient Descent

ICML 2018
0
citations