Lei Zhang

233

Papers

1,499

Total Citations

Papers (233)

SeeSR: Towards Semantics-Aware Real-World Image Super-Resolution

Dual Adversarial Network: Toward Real-world Noise Removal and Noise Generation

Pixel-Aware Stable Diffusion for Realistic Image Super-Resolution and Personalized Stylization

Osprey: Pixel Understanding with Visual Instruction Tuning

DreamTime: An Improved Optimization Strategy for Diffusion-Guided 3D Generation

ChatTime: A Unified Multimodal Time Series Foundation Model Bridging Numerical and Textual Data

Progressive3D: Progressively Local Editing for Text-to-3D Content Creation with Complex Semantic Prompts

Visual In-Context Prompting

Implicit Discriminative Knowledge Learning for Visible-Infrared Person Re-Identification

Scaling Speech-Text Pre-training with Synthetic Interleaved Data

CoCoCo: Improving Text-Guided Video Inpainting for Better Consistency, Controllability and Compatibility

Open-World Human-Object Interaction Detection via Multi-modal Prompts

ScaleDreamer: Scalable Text-to-3D Synthesis with Asynchronous Score Distillation

Adversarial Diffusion Compression for Real-World Image Super-Resolution

Toward Generalized Image Quality Assessment: Relaxing the Perfect Reference Quality Assumption

Self-Supervised Video Desmoking for Laparoscopic Surgery

Hierarchical Context Pruning: Optimizing Real-World Code Completion with Repository-Level Pretrained Code LLMs

ScatterFormer: Efficient Voxel Transformer with Scattered Linear Attention

Referring to Any Person

Robust Overfitting Does Matter: Test-Time Adversarial Purification With FGSM

SkillMimic: Learning Basketball Interaction Skills from Demonstrations

Neural Super-Resolution for Real-time Rendering with Radiance Demodulation

Generalized and Efficient 2D Gaussian Splatting for Arbitrary-scale Super-Resolution

Symbol as Points: Panoptic Symbol Spotting via Point-based Representation

Generalizable Sensor-Based Activity Recognition via Categorical Concept Invariant Learning

D^2iT: Dynamic Diffusion Transformer for Accurate Image Generation

HandOS: 3D Hand Reconstruction in One Stage

Integrating Visual Interpretation and Linguistic Reasoning for Geometric Problem Solving

HumanMM: Global Human Motion Recovery from Multi-shot Videos

SyncNoise: Geometrically Consistent Noise Prediction for Instruction-based 3D Editing

Reverse Convolution and Its Applications to Image Restoration

PASS: Path-selective State Space Model for Event-based Recognition

The Underappreciated Power of Vision Models for Graph Structural Understanding

Multi-Edge Reinforced Collaborative Data Acquisition for Continuous Video Analytics by Prioritizing Quality over Quantity

Linguistic-Aware Patch Slimming Framework for Fine-grained Cross-Modal Alignment

UniVS: Unified and Universal Video Segmentation with Prompts as Queries

Efficient Scene Recovery Using Luminous Flux Prior

Uncertainty-Aware Source-Free Adaptive Image Super-Resolution with Wavelet Augmentation Transformer

State-Constrained Zero-Sum Differential Games with One-Sided Information

DNA-SE: Towards Deep Neural-Nets Assisted Semiparametric Estimation

HumanTOMATO: Text-aligned Whole-body Motion Generation

Reweighted Laplace Prior Based Hyperspectral Compressive Sensing for Unknown Sparsity

Discriminative Learning of Iteration-Wise Priors for Blind Deconvolution

Joint Learning of Single-Image and Cross-Image Representations for Person Re-Identification

Group MAD Competition - A New Methodology to Compare Objective Image Quality Models

Multispectral Images Denoising by Intrinsic Tensor Sparsity Regularization

Dictionary Pair Classifier Driven Convolutional Neural Networks for Object Detection

A Probabilistic Collaborative Representation Based Approach for Pattern Classification

Object Tracking via Dual Linear Structured SVM and Explicit Feature Map

RAID-G: Robust Estimation of Approximate Infinite Dimensional Gaussian With Application to Material Recognition

G2DeNet: Global Gaussian Distribution Embedding Network and Its Application to Visual Recognition

Learning Dynamic Guidance for Depth Image Enhancement

Learning Deep CNN Denoiser Prior for Image Restoration

Fine-Tuning Convolutional Neural Networks for Biomedical Image Analysis: Actively and Incrementally

Towards Human-Machine Cooperation: Self-Supervised Sample Mining for Object Detection

Learning a Single Convolutional Super-Resolution Network for Multiple Degradations

A Hybrid l1-l0 Layer Decomposition Model for Tone Mapping

Learning Spatial-Temporal Regularized Correlation Filters for Visual Tracking

CleanNet: Transfer Learning for Scalable Image Classifier Training With Label Noise

Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering

A PID Controller Approach for Stochastic Optimization of Deep Networks

Deep Plug-And-Play Super-Resolution for Arbitrary Blur Kernels

Toward Convolutional Blind Denoising of Real Photographs

Reliable and Efficient Image Cropping: A Grid Anchor Based Approach

FOCNet: A Fractional Optimal Control Network for Image Denoising

Reinforced Cross-Modal Matching and Self-Supervised Imitation Learning for Vision-Language Navigation

Variational Bayesian Dropout With a Hierarchical Prior

Second-Order Attention Network for Single Image Super-Resolution

Object-Driven Text-To-Image Synthesis via Adversarial Training

Multi-Domain Learning for Accurate and Few-Shot Color Constancy

Unsupervised Adaptation Learning for Hyperspectral Imagery Super-Resolution

CPR-GCN: Conditional Partial-Residual Graph Convolutional Network in Automated Anatomical Labeling of Coronary Arteries

HigherHRNet: Scale-Aware Representation Learning for Bottom-Up Human Pose Estimation

Probability Weighted Compact Feature for Domain Adaptive Retrieval

Structure Aware Single-Stage 3D Object Detection From Point Cloud

VirFace: Enhancing Face Recognition via Unlabeled Shallow Data

Contrastive Learning Based Hybrid Networks for Long-Tailed Image Classification

VinVL: Revisiting Visual Representations in Vision-Language Models

Spatial Feature Calibration and Temporal Fusion for Effective One-Stage Video Instance Segmentation

PPR10K: A Large-Scale Portrait Photo Retouching Dataset With Human-Region Mask and Group-Level Consistency

Unsupervised Part Segmentation Through Disentangling Appearance and Shape

Progressive Semantic-Aware Style Transformation for Blind Face Restoration

Interactive Self-Training With Mean Teachers for Semi-Supervised Object Detection

Virtual Fully-Connected Layer: Training a Large-Scale Face Recognition Dataset With Limited Computational Resources

Unsupervised Pre-Training for Person Re-Identification

Learning Parallel Dense Correspondence From Spatio-Temporal Descriptors for Efficient and Robust 4D Reconstruction

GAN Prior Embedded Network for Blind Face Restoration in the Wild

Dynamic Weighted Learning for Unsupervised Domain Adaptation

Dynamic Head: Unifying Object Detection Heads With Attentions

Learning Tensor Low-Rank Prior for Hyperspectral Image Reconstruction

Deep Convolutional Dictionary Learning for Image Denoising

TAP: Text-Aware Pre-Training for Text-VQA and Text-Caption

High-Resolution Photorealistic Image Translation in Real-Time: A Laplacian Pyramid Translation Network

Lite-HRNet: A Lightweight High-Resolution Network

DAP: Detection-Aware Pre-Training With Weak Supervision

Voxel Set Transformer: A Set-to-Set Approach to 3D Object Detection From Point Clouds

Exact Feature Distribution Matching for Arbitrary Style Transfer and Domain Generalization

Details or Artifacts: A Locally Discriminative Learning Approach to Realistic Image Super-Resolution

DN-DETR: Accelerate DETR Training by Introducing Query DeNoising

Dense Learning Based Semi-Supervised Object Detection

Quantization-Aware Deep Optics for Diffractive Snapshot Hyperspectral Imaging

Grounded Language-Image Pre-Training

Blind Image Super-Resolution With Elaborate Degradation Modeling on Noise and Kernel

Large-Scale Pre-Training for Person Re-Identification With Noisy Labels

Towards Efficient Data Free Black-Box Adversarial Attack

A Differentiable Two-Stage Alignment Scheme for Burst Image Reconstruction With Large Shift

Neural Architecture Search With Representation Mutual Information

A Dual Weighting Label Assignment Scheme for Object Detection

A Text Attention Network for Spatial Deformation Robust Scene Text Image Super-Resolution

Class-Balanced Pixel-Level Self-Labeling for Domain Adaptive Semantic Segmentation

DynaMask: Dynamic Mask Selection for Instance Segmentation

Revisiting Prototypical Network for Cross Domain Few-Shot Learning

A General Regret Bound of Preconditioned Gradient Method for DNN Training

OTAvatar: One-Shot Talking Face Avatar With Controllable Tri-Plane Rendering

Glocal Energy-Based Learning for Few-Shot Open-Set Recognition

DisCo-CLIP: A Distributed Contrastive Loss for Memory Efficient CLIP Training

SIM: Semantic-Aware Instance Mask Generation for Box-Supervised Instance Segmentation

Accelerating Dataset Distillation via Model Augmentation

Human-Art: A Versatile Human-Centric Dataset Bridging Natural and Artificial Scenes

MSF: Motion-Guided Sequential Fusion for Efficient 3D Object Detection From Point Cloud Sequences

MDQE: Mining Discriminative Query Embeddings To Segment Occluded Instances on Challenging Videos

Sharpness-Aware Gradient Matching for Domain Generalization

One-Stage 3D Whole-Body Mesh Recovery With Component Aware Transformer

Human Guided Ground-Truth Generation for Realistic Image Super-Resolution

Mask DINO: Towards a Unified Transformer-Based Framework for Object Detection and Segmentation

Inferring and Leveraging Parts From Object Shape for Improving Semantic Image Synthesis

Joint HDR Denoising and Fusion: A Real-World Mobile HDR Image Dataset

MP-Former: Mask-Piloted Transformer for Image Segmentation

One-to-Few Label Assignment for End-to-End Dense Detection

Multi-View Adversarial Discriminator: Mine the Non-Causal Factors for Object Detection in Unseen Domains

Lite DETR: An Interleaved Multi-Scale Encoder for Efficient DETR

Patch Group Based Nonlocal Self-Similarity Prior Learning for Image Denoising

External Patch Prior Guided Internal Clustering for Image Denoising

Convolutional Sparse Coding for Image Super-Resolution

Hyperspectral Compressive Sensing Using Manifold-Structured Sparsity Prior

Higher-Order Integration of Hierarchical Convolutional Activations for Fine-Grained Visual Categorization

When Unsupervised Domain Adaptation Meets Tensor Representations

Multi-Channel Weighted Nuclear Norm Minimization for Real Color Image Denoising

Joint Convolutional Analysis and Synthesis Sparse Representation for Single Image Layer Separation

3D Surface Detail Enhancement From a Single Normal Map

Toward Real-World Single Image Super-Resolution: A New Benchmark and a New Model

Dynamic Anchor Feature Selection for Single-Shot Object Detection

Multi-Adversarial Faster-RCNN for Unrestricted Object Detection

WSOD2: Learning Bottom-Up and Top-Down Objectness Distillation for Weakly-Supervised Object Detection

Multi-Scale Vision Longformer: A New Vision Transformer for High-Resolution Image Encoding

Variational Attention: Propagating Domain-Specific Knowledge for Multi-Domain Learning in Crowd Counting

SA-ConvONet: Sign-Agnostic Optimization of Convolutional Occupancy Networks

Dynamic DETR: End-to-End Object Detection With Dynamic Attention

CvT: Introducing Convolutions to Vision Transformers

Real-World Video Super-Resolution: A Benchmark Dataset and a Decomposition Based Learning Scheme

Reconcile Prediction Consistency for Balanced Object Detection

HDR Video Reconstruction: A Coarse-To-Fine Network and a Real-World Benchmark Dataset

MicroNet: Improving Image Recognition With Extremely Low FLOPs

Improve Unsupervised Pretraining for Few-Label Transfer

A Benchmark for Chinese-English Scene Text Image Super-Resolution

CORE: Cooperative Reconstruction for Multi-Agent Perception

Point2Mask: Point-supervised Panoptic Segmentation via Optimal Transport

Towards Fairness-aware Adversarial Network Pruning

A Simple Framework for Open-Vocabulary Segmentation and Detection

FPR: False Positive Rectification for Weakly Supervised Semantic Segmentation

DFA3D: 3D Deformable Attention For 2D-to-3D Feature Lifting

RCA-NOC: Relative Contrastive Alignment for Novel Object Captioning

Generative Action Description Prompts for Skeleton-based Action Recognition

Detection Transformer with Stable Matching

HumanSD: A Native Skeleton-Guided Diffusion Model for Human Image Generation

Isomer: Isomerous Transformer for Zero-shot Video Object Segmentation

ELITE: Encoding Visual Concepts into Textual Embeddings for Customized Text-to-Image Generation

Neural Interactive Keypoint Detection

Automatic Network Pruning via Hilbert-Schmidt Independence Criterion Lasso under Information Bottleneck Principle

Gradient Centralization: A New Optimization Technique for Deep Neural Networks

Suppress and Balance: A Simple Gated Network for Salient Object Detection

Label Propagation with Augmented Anchors: A Simple Semi-Supervised Learning baseline for Unsupervised Domain Adaptation

Blind Face Restoration via Deep Multi-scale Component Dictionaries

LST-Net: Learning a Convolutional Neural Network with a Learnable Sparse Transform

Momentum Batch Normalization for Deep Learning with Small Batch Size

A Single Stream Network for Robust and Real-time RGB-D Salient Object Detection

Domain Adaptive Object Detection via Asymmetric Tri-way Faster-RCNN

A Decoupled Learning Scheme for Real-world Burst Denoising from Raw Images

Boosting Weakly Supervised Object Detection with Progressive Knowledge Transfer

Oscar: Object-Semantics Aligned Pre-training for Vision-Language Tasks

Spatiotemporal Self-Attention Modeling with Temporal Patch Shift for Action Recognition

Dual-Stream Knowledge-Preserving Hashing for Unsupervised Video Retrieval

Efficient Long-Range Attention Network for Image Super-Resolution

From Face to Natural Image: Learning Real Degradation for Blind Image Super-Resolution

Unfolded Deep Kernel Estimation for Blind Image Super-Resolution

Efficient and Degradation-Adaptive Network for Real-World Image Super-Resolution

An Embedded Feature Whitening Approach to Deep Neural Network Optimization

Box-Supervised Instance Segmentation with Level Set Evolution

Attention Diversification for Domain Generalization

View Confusion Feature Learning for Person Re-Identification

Low-Biased General Annotated Dataset Generation

RORem: Training a Robust Object Remover with Human-in-the-Loop

Pixel-level and Semantic-level Adjustable Super-resolution: A Dual-LoRA Approach

MaSS13K: A Matting-level Semantic Segmentation Benchmark

Progressive Rendering Distillation: Adapting Stable Diffusion for Instant Text-to-Mesh Generation without 3D Data

LeanGaussian: Breaking Pixel or Point Cloud Correspondence in Modeling 3D Gaussians

OSMamba: Omnidirectional Spectral Mamba with Dual-Domain Prior Generator for Exposure Correction

FeedEdit: Text-Based Image Editing with Dynamic Feedback Regulation

Prior-aware Dynamic Temporal Modeling Framework for Sequential 3D Hand Pose Estimation

FiVE-Bench: A Fine-grained Video Editing Benchmark for Evaluating Emerging Diffusion and Rectified Flow Models

Co-Painter: Fine-Grained Controllable Image Stylization via Implicit Decoupling and Adaptive Injection

UniGS: Modeling Unitary 3D Gaussians for Novel View Synthesis from Sparse-view Images

ForgeLens: Data-Efficient Forgery Focus for Generalizable Forgery Image Detection

Fine-structure Preserved Real-world Image Super-resolution via Transfer VAE Training

Towards Effective Foundation Model Adaptation for Extreme Cross-Domain Few-Shot Learning

Hierarchy-Aware Pseudo Word Learning with Text Adaptation for Zero-Shot Composed Image Retrieval

Dual-Temporal Exemplar Representation Network for Video Semantic Segmentation

InsViE-1M: Effective Instruction-based Video Editing with Elaborate Dataset Construction

Perceive, Understand and Restore: Real-World Image Super-Resolution with Autoregressive Multimodal Generative Models

Polyline Path Masked Attention for Vision Transformer

SLRL: Semi-Supervised Local Community Detection Based on Reinforcement Learning

CustomContrast: A Multilevel Contrastive Perspective for Subject-Driven Text-to-Image Customization

GaussianSR: High Fidelity 2D Gaussian Splatting for Arbitrary-Scale Image Super-Resolution

Manta: Enhancing Mamba for Few-Shot Action Recognition of Long Sub-Sequence

MARS: Mixture of Auto-Regressive Models for Fine-grained Text-to-image Synthesis

GapMatch: Bridging Instance and Model Perturbations for Enhanced Semi-Supervised Medical Image Segmentation

Adversarial Contrastive Graph Augmentation with Counterfactual Regularization

Imitate Before Detect: Aligning Machine Stylistic Preference for Machine-Revised Text Detection

Fine-Tuning Language Models with Collaborative and Semantic Experts

Dynamic Weighted Combiner for Mixed-Modal Image Retrieval

Identification of Necessary Semantic Undertakers in the Causal View for Image-Text Matching

Gradual Residuals Alignment: A Dual-Stream Framework for GAN Inversion and Image Attribute Editing

Dual Memory Networks: A Versatile Adaptation Approach for Vision-Language Models

Turbo Learning for CaptionBot and DrawingBot

Variational Denoising Network: Toward Blind Noise Modeling and Removal

Chasing Sparsity in Vision Transformers: An End-to-End Exploration

DreamWaltz: Make a Scene with Complex 3D Animatable Avatars

SMPLer-X: Scaling Up Expressive Human Pose and Shape Estimation

Motion-X: A Large-scale 3D Expressive Whole-body Human Motion Dataset

Semi-Supervised Domain Generalization with Known and Unknown Classes

Label-efficient Segmentation via Affinity Propagation

A Comprehensive Benchmark for Neural Human Radiance Fields

MomentDiff: Generative Video Moment Retrieval from Random to Real