Wei Zhang

130

Papers

1,750

Total Citations

Papers (130)

Can Decentralized Algorithms Outperform Centralized Algorithms? A Case Study for Decentralized Parallel Stochastic Gradient Descent

NeurIPS 2017arXiv

Referred by Multi-Modality: A Unified Temporal Transformer for Video Object Segmentation

DetCLIPv3: Towards Versatile Generative Open-vocabulary Object Detection

EMOVA: Empowering Language Models to See, Hear and Speak with Vivid Emotions

ILLUME: Illuminating Your LLMs to See, Draw, and Self-Enhance

Latent Space Editing in Transformer-Based Flow Matching

Towards a Simultaneous and Granular Identity-Expression Control in Personalized Face Generation

Language-Driven Anchors for Zero-Shot Adversarial Robustness

Decoupled Pseudo-labeling for Semi-Supervised Monocular 3D Object Detection

Object Detection using Event Camera: A MoE Heat Conduction based Detector and A New Benchmark Dataset

HiRes-LLaVA: Restoring Fragmentation Input in High-Resolution Large Vision-Language Models

Gaussian Process Neural Additive Models

LaneGraph2Seq: Lane Topology Extraction with Language Model via Vertex-Edge Encoding and Connectivity Enhancement

GeoReF: Geometric Alignment Across Shape Variation for Category-level Object Pose Refinement

MMReason: An Open-Ended Multi-Modal Multi-Step Reasoning Benchmark for MLLMs Toward AGI

KORGym: A Dynamic Game Platform for LLM Reasoning Evaluation

As Simple as Fine-tuning: LLM Alignment via Bidirectional Negative Feedback Loss

Less Attention is More: Prompt Transformer for Generalized Category Discovery

PBCAT: Patch-Based Composite Adversarial Training against Physically Realizable Attacks on Object Detection

EasyCraft: A Robust and Efficient Framework for Automatic Avatar Crafting

Context Guided Transformer Entropy Modeling for Video Compression

Learning Implicit Features with Flow-Infused Transformations for Realistic Virtual Try-On

SleepSMC: Ubiquitous Sleep Staging via Supervised Multimodal Coordination

Guiding Cross-Modal Representations with MLLM Priors via Preference Alignment

Binarized Mode Seeking for Scalable Visual Pattern Discovery

Optical Flow Guided Feature: A Fast and Robust Motion Representation for Video Action Recognition

Reconstruction Network for Video Captioning

Unsupervised Person Image Generation With Semantic Parsing Transformation

Destruction and Construction Learning for Fine-Grained Image Recognition

Embedding Complementary Deep Networks for Image Classification

Look-Into-Object: Self-Supervised Structure Modeling for Object Recognition

SP-NAS: Serial-to-Parallel Backbone Search for Object Detection

Adaptive Fractional Dilated Convolution Network for Image Aesthetics Assessment

Points As Queries: Weakly Semi-Supervised Object Detection by Points

Source-Free Domain Adaptation for Semantic Segmentation

Mesh Saliency: An Independent Perceptual Measure or a Derivative of Image Saliency?

Focus on Local: Detecting Lane Marker From Bottom Up via Key Point

Zero-Shot Adversarial Quantization

HourNAS: Extremely Fast Neural Architecture Search Through an Hourglass Lens

UAV-Human: A Large Benchmark for Human Behavior Understanding With Unmanned Aerial Vehicles

Discrimination-Aware Mechanism for Fine-Grained Representation Learning

LPSNet: A Lightweight Solution for Fast Panoptic Segmentation

Learning a Facial Expression Embedding Disentangled From Identity

Joint-DetNAS: Upgrade Your Detector With NAS, Pruning and Dynamic Distillation

Point2Seq: Detecting 3D Objects As Sequences

ISDNet: Integrating Shallow and Deep Networks for Efficient Ultra-High Resolution Segmentation

Directional Self-Supervised Learning for Heavy Image Augmentations

A Large-Scale Comprehensive Dataset and Copy-Overlap Aware Evaluation Protocol for Segment-Level Video Copy Detection

FERV39k: A Large-Scale Multi-Scene Dataset for Facial Expression Recognition in Videos

Class-Aware Contrastive Semi-Supervised Learning

PointCLIP: Point Cloud Understanding by CLIP

Ambiguity-Resistant Semi-Supervised Learning for Dense Object Detection

DetCLIPv2: Scalable Open-Vocabulary Object Detection Pre-Training via Word-Region Alignment

Semi-DETR: Semi-Supervised Object Detection With Detection Transformers

CapDet: Unifying Dense Captioning and Open-World Detection Pretraining

HS-Pose: Hybrid Scope Feature Extraction for Category-Level Object Pose Estimation

Multiple Granularity Descriptors for Fine-Grained Categorization

A Spatio-Temporal Appearance Representation for Viceo-Based Pedestrian Re-Identification

Controllable Video Captioning With POS Sequence Guidance Based on Gated Fusion Network

Auto-FPN: Automatic Network Architecture Adaptation for Object Detection Beyond Classification

Sampling Wisely: Deep Image Embedding by Top-K Precision Optimization

VrR-VG: Refocusing Visually-Relevant Relationships

G-DetKD: Towards General Distillation Framework for Object Detectors via Contrastive and Semantic-Guided Feature Imitation

Box-Aware Feature Enhancement for Single Object Tracking on Point Clouds

Exploring Geometry-Aware Contrast and Clustering Harmonization for Self-Supervised 3D Object Detection

C3-SemiSeg: Contrastive Semi-Supervised Segmentation via Cross-Set Learning and Dynamic Class-Balancing

E2E-LOAD: End-to-End Long-form Online Action Detection

WaterMask: Instance Segmentation for Underwater Imagery

Data-free Knowledge Distillation for Fine-grained Visual Categorization

Gradient-based Sampling for Class Imbalanced Semi-supervised Object Detection

Translating Images to Road Network: A Non-Autoregressive Sequence-to-Sequence Approach

LVOS: A Benchmark for Long-term Video Object Segmentation

CFCG: Semi-Supervised Semantic Segmentation via Cross-Fusion and Contour Guidance Supervision

DiffDis: Empowering Generative Diffusion Model with Cross-Modal Discrimination Capability

GrowCLIP: Data-Aware Automatic Model Growing for Large-scale Contrastive Language-Image Pre-Training

MRN: Multiplexed Routing Network for Incremental Multilingual Text Recognition

Towards High-Fidelity Text-Guided 3D Face Generation and Manipulation Using only Images

Segment as Points for Efficient Online Multi-Object Tracking and Segmentation

HARD-Net: Hardness-AwaRe Discrimination Network for 3D Early Activity Prediction

Classes Matter: A Fine-grained Adversarial Approach to Cross-domain Semantic Segmentation

CurveLane-NAS: Unifying Lane-Sensitive Architecture Search and Adaptive Point Blending

GeoLayout: Geometry Driven Room Layout Estimation Based on Depth Maps of Planes

Unsupervised Multi-View CNN for Salient View Selection of 3D Objects and Scenes

Bayesian Optimization with Clustering and Rollback for CNN Auto Pruning

Diverse Learner: Exploring Diverse Supervision for Semi-Supervised Object Detection

Tip-Adapter: Training-Free Adaption of CLIP for Few-Shot Classification

Responsive Listening Head Generation: A Benchmark Dataset and Baseline

CODA: A Real-World Road Corner Case Dataset for Object Detection in Autonomous Driving

SFOD: Spiking Fusion Object Detector

Decoupled Motion Expression Video Segmentation

GROVE: A Generalized Reward for Learning Open-Vocabulary Physical Skill

SimpleVQA: Multimodal Factuality Evaluation for Multimodal Large Language Models

AdaDrive: Self-Adaptive Slow-Fast System for Language-Grounded Autonomous Driving

VLDrive: Vision-Augmented Lightweight MLLMs for Efficient Language-grounded Autonomous Driving

LaneDiffusion: Improving Centerline Graph Learning via Prior Injected BEV Feature Generation

General Compression Framework for Efficient Transformer Object Tracking

Efficient Event Camera Data Pretraining with Adaptive Prompt Fusion

PerReactor: Offline Personalised Multiple Appropriate Facial Reaction Generation

In2NeCT: Inter-class and Intra-class Neural Collapse Tuning for Semantic Segmentation of Imbalanced Remote Sensing Images

Coherency Improved Explainable Recommendation via Large Language Model

STAIR: Manipulating Collaborative and Multimodal Information for E-Commerce Recommendation

Symbolic Cognitive Diagnosis via Hybrid Optimization for Intelligent Education Systems

CGMGM: A Cross-Gaussian Mixture Generative Model for Few-Shot Semantic Segmentation

EVS-assisted Joint Deblurring Rolling-Shutter Correction and Video Frame Interpolation through Sensor Inverse Modeling

BIVDiff: A Training-Free Framework for General-Purpose Video Synthesis via Bridging Image and Video Diffusion Models

Enhanced Motion-Text Alignment for Image-to-Video Transfer Learning

Event-based Visible and Infrared Fusion via Multi-task Collaboration

Holistic Autonomous Driving Understanding by Bird’s-Eye-View Injected Multi-Modal Large Models

HetSSNet: Spatial-Spectral Heterogeneous Graph Learning Network for Panchromatic and Multispectral Images Fusion

ESNet: Evolution and Succession Network for High-Resolution Salient Object Detection

Interpreting and Improving Large Language Models in Arithmetic Calculation

Weakly Supervised Semantic Segmentation for Social Images

Evolutionary Stochastic Gradient Descent for Optimization of Deep Neural Networks

Hybrid 8-bit Floating Point (HFP8) Training and Inference for Deep Neural Networks

Residual Distillation: Towards Portable Deep Neural Networks without Shortcuts

A Decentralized Parallel Algorithm for Training Generative Adversarial Nets

Online Decision Based Visual Tracking via Reinforcement Learning

Kernel Based Progressive Distillation for Adder Neural Networks

ScaleCom: Scalable Sparsified Gradient Compression for Communication-Efficient Distributed Training

Finite-Time Analysis for Double Q-learning

Model Rubik’s Cube: Twisting Resolution, Depth and Width for TinyNets

Post-Training Quantization for Vision Transformer

Scalable Rule-Based Representation Learning for Interpretable Classification

DetCLIP: Dictionary-Enriched Visual-Concept Paralleled Pre-training for Open-world Detection

Robustness to Unbounded Smoothness of Generalized SignSGD

Leveraging the Hints: Adaptive Bidding in Repeated First-Price Auctions

Wukong: A 100 Million Large-scale Chinese Cross-modal Pre-training Benchmark

Reading Relevant Feature from Global Representation Memory for Visual Object Tracking

OpenLane-V2: A Topology Reasoning Benchmark for Unified 3D HD Mapping

Asynchronous Decentralized Parallel Stochastic Gradient Descent