Lei Zhang

73

Papers

1,256

Total Citations

Papers (73)

SeeSR: Towards Semantics-Aware Real-World Image Super-Resolution

Pixel-Aware Stable Diffusion for Realistic Image Super-Resolution and Personalized Stylization

Osprey: Pixel Understanding with Visual Instruction Tuning

DreamTime: An Improved Optimization Strategy for Diffusion-Guided 3D Generation

ChatTime: A Unified Multimodal Time Series Foundation Model Bridging Numerical and Textual Data

Progressive3D: Progressively Local Editing for Text-to-3D Content Creation with Complex Semantic Prompts

Visual In-Context Prompting

Implicit Discriminative Knowledge Learning for Visible-Infrared Person Re-Identification

Scaling Speech-Text Pre-training with Synthetic Interleaved Data

CoCoCo: Improving Text-Guided Video Inpainting for Better Consistency, Controllability and Compatibility

Open-World Human-Object Interaction Detection via Multi-modal Prompts

ScaleDreamer: Scalable Text-to-3D Synthesis with Asynchronous Score Distillation

Adversarial Diffusion Compression for Real-World Image Super-Resolution

Toward Generalized Image Quality Assessment: Relaxing the Perfect Reference Quality Assumption

Hierarchical Context Pruning: Optimizing Real-World Code Completion with Repository-Level Pretrained Code LLMs

Self-Supervised Video Desmoking for Laparoscopic Surgery

Referring to Any Person

ScatterFormer: Efficient Voxel Transformer with Scattered Linear Attention

Robust Overfitting Does Matter: Test-Time Adversarial Purification With FGSM

SkillMimic: Learning Basketball Interaction Skills from Demonstrations

Neural Super-Resolution for Real-time Rendering with Radiance Demodulation

Generalized and Efficient 2D Gaussian Splatting for Arbitrary-scale Super-Resolution

Symbol as Points: Panoptic Symbol Spotting via Point-based Representation

Generalizable Sensor-Based Activity Recognition via Categorical Concept Invariant Learning

D^2iT: Dynamic Diffusion Transformer for Accurate Image Generation

HandOS: 3D Hand Reconstruction in One Stage

HumanMM: Global Human Motion Recovery from Multi-shot Videos

Integrating Visual Interpretation and Linguistic Reasoning for Geometric Problem Solving

SyncNoise: Geometrically Consistent Noise Prediction for Instruction-based 3D Editing

PASS: Path-selective State Space Model for Event-based Recognition

Reverse Convolution and Its Applications to Image Restoration

Multi-Edge Reinforced Collaborative Data Acquisition for Continuous Video Analytics by Prioritizing Quality over Quantity

The Underappreciated Power of Vision Models for Graph Structural Understanding

Dual Memory Networks: A Versatile Adaptation Approach for Vision-Language Models

Linguistic-Aware Patch Slimming Framework for Fine-grained Cross-Modal Alignment

UniVS: Unified and Universal Video Segmentation with Prompts as Queries

Efficient Scene Recovery Using Luminous Flux Prior

Uncertainty-Aware Source-Free Adaptive Image Super-Resolution with Wavelet Augmentation Transformer

State-Constrained Zero-Sum Differential Games with One-Sided Information

DNA-SE: Towards Deep Neural-Nets Assisted Semiparametric Estimation

HumanTOMATO: Text-aligned Whole-body Motion Generation

Low-Biased General Annotated Dataset Generation

RORem: Training a Robust Object Remover with Human-in-the-Loop

Pixel-level and Semantic-level Adjustable Super-resolution: A Dual-LoRA Approach

MaSS13K: A Matting-level Semantic Segmentation Benchmark

Progressive Rendering Distillation: Adapting Stable Diffusion for Instant Text-to-Mesh Generation without 3D Data

LeanGaussian: Breaking Pixel or Point Cloud Correspondence in Modeling 3D Gaussians

OSMamba: Omnidirectional Spectral Mamba with Dual-Domain Prior Generator for Exposure Correction

FeedEdit: Text-Based Image Editing with Dynamic Feedback Regulation

Prior-aware Dynamic Temporal Modeling Framework for Sequential 3D Hand Pose Estimation

FiVE-Bench: A Fine-grained Video Editing Benchmark for Evaluating Emerging Diffusion and Rectified Flow Models

Co-Painter: Fine-Grained Controllable Image Stylization via Implicit Decoupling and Adaptive Injection

UniGS: Modeling Unitary 3D Gaussians for Novel View Synthesis from Sparse-view Images

ForgeLens: Data-Efficient Forgery Focus for Generalizable Forgery Image Detection

Fine-structure Preserved Real-world Image Super-resolution via Transfer VAE Training

Towards Effective Foundation Model Adaptation for Extreme Cross-Domain Few-Shot Learning

Hierarchy-Aware Pseudo Word Learning with Text Adaptation for Zero-Shot Composed Image Retrieval

Dual-Temporal Exemplar Representation Network for Video Semantic Segmentation

InsViE-1M: Effective Instruction-based Video Editing with Elaborate Dataset Construction

Perceive, Understand and Restore: Real-World Image Super-Resolution with Autoregressive Multimodal Generative Models

Polyline Path Masked Attention for Vision Transformer

SLRL: Semi-Supervised Local Community Detection Based on Reinforcement Learning

CustomContrast: A Multilevel Contrastive Perspective for Subject-Driven Text-to-Image Customization

GaussianSR: High Fidelity 2D Gaussian Splatting for Arbitrary-Scale Image Super-Resolution

Manta: Enhancing Mamba for Few-Shot Action Recognition of Long Sub-Sequence

MARS: Mixture of Auto-Regressive Models for Fine-grained Text-to-image Synthesis

GapMatch: Bridging Instance and Model Perturbations for Enhanced Semi-Supervised Medical Image Segmentation

Adversarial Contrastive Graph Augmentation with Counterfactual Regularization

Imitate Before Detect: Aligning Machine Stylistic Preference for Machine-Revised Text Detection

Fine-Tuning Language Models with Collaborative and Semantic Experts

Dynamic Weighted Combiner for Mixed-Modal Image Retrieval

Identification of Necessary Semantic Undertakers in the Causal View for Image-Text Matching

Gradual Residuals Alignment: A Dual-Stream Framework for GAN Inversion and Image Attribute Editing