Wei Zhang
130
Papers
1,750
Total Citations
Papers (130)
Can Decentralized Algorithms Outperform Centralized Algorithms? A Case Study for Decentralized Parallel Stochastic Gradient Descent
NeurIPS 2017arXiv
1,364
citations
Referred by Multi-Modality: A Unified Temporal Transformer for Video Object Segmentation
AAAI 2024arXiv
58
citations
DetCLIPv3: Towards Versatile Generative Open-vocabulary Object Detection
CVPR 2024
45
citations
EMOVA: Empowering Language Models to See, Hear and Speak with Vivid Emotions
CVPR 2025
44
citations
ILLUME: Illuminating Your LLMs to See, Draw, and Self-Enhance
ICCV 2025
43
citations
Latent Space Editing in Transformer-Based Flow Matching
AAAI 2024arXiv
38
citations
Towards a Simultaneous and Granular Identity-Expression Control in Personalized Face Generation
CVPR 2024
32
citations
Language-Driven Anchors for Zero-Shot Adversarial Robustness
CVPR 2024
21
citations
Decoupled Pseudo-labeling for Semi-Supervised Monocular 3D Object Detection
CVPR 2024
19
citations
Object Detection using Event Camera: A MoE Heat Conduction based Detector and A New Benchmark Dataset
CVPR 2025
18
citations
HiRes-LLaVA: Restoring Fragmentation Input in High-Resolution Large Vision-Language Models
CVPR 2025
13
citations
Gaussian Process Neural Additive Models
AAAI 2024arXiv
11
citations
LaneGraph2Seq: Lane Topology Extraction with Language Model via Vertex-Edge Encoding and Connectivity Enhancement
AAAI 2024arXiv
10
citations
GeoReF: Geometric Alignment Across Shape Variation for Category-level Object Pose Refinement
CVPR 2024
8
citations
MMReason: An Open-Ended Multi-Modal Multi-Step Reasoning Benchmark for MLLMs Toward AGI
ICCV 2025
7
citations
KORGym: A Dynamic Game Platform for LLM Reasoning Evaluation
NeurIPS 2025
4
citations
As Simple as Fine-tuning: LLM Alignment via Bidirectional Negative Feedback Loss
ICLR 2025
3
citations
Less Attention is More: Prompt Transformer for Generalized Category Discovery
CVPR 2025
3
citations
PBCAT: Patch-Based Composite Adversarial Training against Physically Realizable Attacks on Object Detection
ICCV 2025
3
citations
EasyCraft: A Robust and Efficient Framework for Automatic Avatar Crafting
CVPR 2025
2
citations
Context Guided Transformer Entropy Modeling for Video Compression
ICCV 2025
1
citations
Learning Implicit Features with Flow-Infused Transformations for Realistic Virtual Try-On
ICCV 2025
1
citations
SleepSMC: Ubiquitous Sleep Staging via Supervised Multimodal Coordination
ICLR 2025
1
citations
Guiding Cross-Modal Representations with MLLM Priors via Preference Alignment
NeurIPS 2025
1
citations
Binarized Mode Seeking for Scalable Visual Pattern Discovery
CVPR 2017
0
citations
Optical Flow Guided Feature: A Fast and Robust Motion Representation for Video Action Recognition
CVPR 2018arXiv
0
citations
Reconstruction Network for Video Captioning
CVPR 2018arXiv
0
citations
Unsupervised Person Image Generation With Semantic Parsing Transformation
CVPR 2019
0
citations
Destruction and Construction Learning for Fine-Grained Image Recognition
CVPR 2019
0
citations
Embedding Complementary Deep Networks for Image Classification
CVPR 2019
0
citations
Look-Into-Object: Self-Supervised Structure Modeling for Object Recognition
CVPR 2020
0
citations
SP-NAS: Serial-to-Parallel Backbone Search for Object Detection
CVPR 2020
0
citations
Adaptive Fractional Dilated Convolution Network for Image Aesthetics Assessment
CVPR 2020arXiv
0
citations
Points As Queries: Weakly Semi-Supervised Object Detection by Points
CVPR 2021arXiv
0
citations
Source-Free Domain Adaptation for Semantic Segmentation
CVPR 2021arXiv
0
citations
Mesh Saliency: An Independent Perceptual Measure or a Derivative of Image Saliency?
CVPR 2021
0
citations
Focus on Local: Detecting Lane Marker From Bottom Up via Key Point
CVPR 2021arXiv
0
citations
Zero-Shot Adversarial Quantization
CVPR 2021arXiv
0
citations
HourNAS: Extremely Fast Neural Architecture Search Through an Hourglass Lens
CVPR 2021arXiv
0
citations
UAV-Human: A Large Benchmark for Human Behavior Understanding With Unmanned Aerial Vehicles
CVPR 2021
0
citations
Discrimination-Aware Mechanism for Fine-Grained Representation Learning
CVPR 2021
0
citations
LPSNet: A Lightweight Solution for Fast Panoptic Segmentation
CVPR 2021
0
citations
Learning a Facial Expression Embedding Disentangled From Identity
CVPR 2021
0
citations
Joint-DetNAS: Upgrade Your Detector With NAS, Pruning and Dynamic Distillation
CVPR 2021
0
citations
Point2Seq: Detecting 3D Objects As Sequences
CVPR 2022arXiv
0
citations
ISDNet: Integrating Shallow and Deep Networks for Efficient Ultra-High Resolution Segmentation
CVPR 2022
0
citations
Directional Self-Supervised Learning for Heavy Image Augmentations
CVPR 2022arXiv
0
citations
A Large-Scale Comprehensive Dataset and Copy-Overlap Aware Evaluation Protocol for Segment-Level Video Copy Detection
CVPR 2022arXiv
0
citations
FERV39k: A Large-Scale Multi-Scene Dataset for Facial Expression Recognition in Videos
CVPR 2022arXiv
0
citations
Class-Aware Contrastive Semi-Supervised Learning
CVPR 2022arXiv
0
citations
PointCLIP: Point Cloud Understanding by CLIP
CVPR 2022arXiv
0
citations
Ambiguity-Resistant Semi-Supervised Learning for Dense Object Detection
CVPR 2023arXiv
0
citations
DetCLIPv2: Scalable Open-Vocabulary Object Detection Pre-Training via Word-Region Alignment
CVPR 2023arXiv
0
citations
Semi-DETR: Semi-Supervised Object Detection With Detection Transformers
CVPR 2023
0
citations
CapDet: Unifying Dense Captioning and Open-World Detection Pretraining
CVPR 2023arXiv
0
citations
HS-Pose: Hybrid Scope Feature Extraction for Category-Level Object Pose Estimation
CVPR 2023
0
citations
Multiple Granularity Descriptors for Fine-Grained Categorization
ICCV 2015
0
citations
A Spatio-Temporal Appearance Representation for Viceo-Based Pedestrian Re-Identification
ICCV 2015
0
citations
Controllable Video Captioning With POS Sequence Guidance Based on Gated Fusion Network
ICCV 2019
0
citations
Auto-FPN: Automatic Network Architecture Adaptation for Object Detection Beyond Classification
ICCV 2019
0
citations
Sampling Wisely: Deep Image Embedding by Top-K Precision Optimization
ICCV 2019
0
citations
VrR-VG: Refocusing Visually-Relevant Relationships
ICCV 2019
0
citations
G-DetKD: Towards General Distillation Framework for Object Detectors via Contrastive and Semantic-Guided Feature Imitation
ICCV 2021
0
citations
Box-Aware Feature Enhancement for Single Object Tracking on Point Clouds
ICCV 2021arXiv
0
citations
Exploring Geometry-Aware Contrast and Clustering Harmonization for Self-Supervised 3D Object Detection
ICCV 2021
0
citations
C3-SemiSeg: Contrastive Semi-Supervised Segmentation via Cross-Set Learning and Dynamic Class-Balancing
ICCV 2021
0
citations
E2E-LOAD: End-to-End Long-form Online Action Detection
ICCV 2023
0
citations
WaterMask: Instance Segmentation for Underwater Imagery
ICCV 2023
0
citations
Data-free Knowledge Distillation for Fine-grained Visual Categorization
ICCV 2023
0
citations
Gradient-based Sampling for Class Imbalanced Semi-supervised Object Detection
ICCV 2023
0
citations
Translating Images to Road Network: A Non-Autoregressive Sequence-to-Sequence Approach
ICCV 2023
0
citations
LVOS: A Benchmark for Long-term Video Object Segmentation
ICCV 2023arXiv
0
citations
CFCG: Semi-Supervised Semantic Segmentation via Cross-Fusion and Contour Guidance Supervision
ICCV 2023
0
citations
DiffDis: Empowering Generative Diffusion Model with Cross-Modal Discrimination Capability
ICCV 2023arXiv
0
citations
GrowCLIP: Data-Aware Automatic Model Growing for Large-scale Contrastive Language-Image Pre-Training
ICCV 2023arXiv
0
citations
MRN: Multiplexed Routing Network for Incremental Multilingual Text Recognition
ICCV 2023arXiv
0
citations
Towards High-Fidelity Text-Guided 3D Face Generation and Manipulation Using only Images
ICCV 2023arXiv
0
citations
Segment as Points for Efficient Online Multi-Object Tracking and Segmentation
ECCV 2020
0
citations
HARD-Net: Hardness-AwaRe Discrimination Network for 3D Early Activity Prediction
ECCV 2020
0
citations
Classes Matter: A Fine-grained Adversarial Approach to Cross-domain Semantic Segmentation
ECCV 2020
0
citations
CurveLane-NAS: Unifying Lane-Sensitive Architecture Search and Adaptive Point Blending
ECCV 2020
0
citations
GeoLayout: Geometry Driven Room Layout Estimation Based on Depth Maps of Planes
ECCV 2020
0
citations
Unsupervised Multi-View CNN for Salient View Selection of 3D Objects and Scenes
ECCV 2020
0
citations
Bayesian Optimization with Clustering and Rollback for CNN Auto Pruning
ECCV 2022
0
citations
Diverse Learner: Exploring Diverse Supervision for Semi-Supervised Object Detection
ECCV 2022
0
citations
Tip-Adapter: Training-Free Adaption of CLIP for Few-Shot Classification
ECCV 2022
0
citations
Responsive Listening Head Generation: A Benchmark Dataset and Baseline
ECCV 2022
0
citations
CODA: A Real-World Road Corner Case Dataset for Object Detection in Autonomous Driving
ECCV 2022
0
citations
SFOD: Spiking Fusion Object Detector
CVPR 2024
0
citations
Decoupled Motion Expression Video Segmentation
CVPR 2025
0
citations
GROVE: A Generalized Reward for Learning Open-Vocabulary Physical Skill
CVPR 2025
0
citations
SimpleVQA: Multimodal Factuality Evaluation for Multimodal Large Language Models
ICCV 2025
0
citations
AdaDrive: Self-Adaptive Slow-Fast System for Language-Grounded Autonomous Driving
ICCV 2025
0
citations
VLDrive: Vision-Augmented Lightweight MLLMs for Efficient Language-grounded Autonomous Driving
ICCV 2025
0
citations
LaneDiffusion: Improving Centerline Graph Learning via Prior Injected BEV Feature Generation
ICCV 2025
0
citations
General Compression Framework for Efficient Transformer Object Tracking
ICCV 2025
0
citations
Efficient Event Camera Data Pretraining with Adaptive Prompt Fusion
ICCV 2025
0
citations
PerReactor: Offline Personalised Multiple Appropriate Facial Reaction Generation
AAAI 2025
0
citations
In2NeCT: Inter-class and Intra-class Neural Collapse Tuning for Semantic Segmentation of Imbalanced Remote Sensing Images
AAAI 2025
0
citations
Coherency Improved Explainable Recommendation via Large Language Model
AAAI 2025
0
citations
STAIR: Manipulating Collaborative and Multimodal Information for E-Commerce Recommendation
AAAI 2025
0
citations
Symbolic Cognitive Diagnosis via Hybrid Optimization for Intelligent Education Systems
AAAI 2024
0
citations
CGMGM: A Cross-Gaussian Mixture Generative Model for Few-Shot Semantic Segmentation
AAAI 2024
0
citations
EVS-assisted Joint Deblurring Rolling-Shutter Correction and Video Frame Interpolation through Sensor Inverse Modeling
CVPR 2024
0
citations
BIVDiff: A Training-Free Framework for General-Purpose Video Synthesis via Bridging Image and Video Diffusion Models
CVPR 2024
0
citations
Enhanced Motion-Text Alignment for Image-to-Video Transfer Learning
CVPR 2024
0
citations
Event-based Visible and Infrared Fusion via Multi-task Collaboration
CVPR 2024
0
citations
Holistic Autonomous Driving Understanding by Bird’s-Eye-View Injected Multi-Modal Large Models
CVPR 2024
0
citations
HetSSNet: Spatial-Spectral Heterogeneous Graph Learning Network for Panchromatic and Multispectral Images Fusion
ICML 2025
0
citations
ESNet: Evolution and Succession Network for High-Resolution Salient Object Detection
ICML 2024
0
citations
Interpreting and Improving Large Language Models in Arithmetic Calculation
ICML 2024
0
citations
Weakly Supervised Semantic Segmentation for Social Images
CVPR 2015
0
citations
Evolutionary Stochastic Gradient Descent for Optimization of Deep Neural Networks
NeurIPS 2018
0
citations
Hybrid 8-bit Floating Point (HFP8) Training and Inference for Deep Neural Networks
NeurIPS 2019
0
citations
Residual Distillation: Towards Portable Deep Neural Networks without Shortcuts
NeurIPS 2020
0
citations
A Decentralized Parallel Algorithm for Training Generative Adversarial Nets
NeurIPS 2020
0
citations
Online Decision Based Visual Tracking via Reinforcement Learning
NeurIPS 2020
0
citations
Kernel Based Progressive Distillation for Adder Neural Networks
NeurIPS 2020
0
citations
ScaleCom: Scalable Sparsified Gradient Compression for Communication-Efficient Distributed Training
NeurIPS 2020
0
citations
Finite-Time Analysis for Double Q-learning
NeurIPS 2020
0
citations
Model Rubik’s Cube: Twisting Resolution, Depth and Width for TinyNets
NeurIPS 2020
0
citations
Post-Training Quantization for Vision Transformer
NeurIPS 2021
0
citations
Scalable Rule-Based Representation Learning for Interpretable Classification
NeurIPS 2021
0
citations
DetCLIP: Dictionary-Enriched Visual-Concept Paralleled Pre-training for Open-world Detection
NeurIPS 2022
0
citations
Robustness to Unbounded Smoothness of Generalized SignSGD
NeurIPS 2022
0
citations
Leveraging the Hints: Adaptive Bidding in Repeated First-Price Auctions
NeurIPS 2022
0
citations
Wukong: A 100 Million Large-scale Chinese Cross-modal Pre-training Benchmark
NeurIPS 2022
0
citations
Reading Relevant Feature from Global Representation Memory for Visual Object Tracking
NeurIPS 2023
0
citations
OpenLane-V2: A Topology Reasoning Benchmark for Unified 3D HD Mapping
NeurIPS 2023
0
citations
Asynchronous Decentralized Parallel Stochastic Gradient Descent
ICML 2018
0
citations