Ran He

56

Papers

1,331

Total Citations

Papers (56)

Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis

Deep Supervised Discrete Hashing

NeurIPS 2017arXiv

ScaleCrafter: Tuning-free Higher-Resolution Visual Generation with Diffusion Models

Breaking the Low-Rank Dilemma of Linear Attention

R-TPT: Improving Adversarial Robustness of Vision-Language Models through Test-Time Prompt Tuning

Video-SafetyBench: A Benchmark for Safety Evaluation of Video LVLMs

Semantic Equitable Clustering: A Simple and Effective Strategy for Clustering Vision Tokens

DiCo: Revitalizing ConvNets for Scalable and Efficient Diffusion Modeling

RMT: Retentive Networks Meet Vision Transformers

Uncertainty-Aware Source-Free Adaptive Image Super-Resolution with Wavelet Augmentation Transformer

Multimodal Prompt Perceiver: Empower Adaptiveness Generalizability and Fidelity for All-in-One Image Restoration

Backdoor Defense via Test-Time Detecting and Repairing

Realistic Unsupervised CLIP Fine-tuning with Universal Entropy Optimization

Connecting the Dots: Collaborative Fine-tuning for Black-Box Vision-Language Models

Pose-Guided Photorealistic Face Rotation

Distant Supervised Centroid Shift: A Simple and Efficient Approach to Visual Domain Adaptation

Cross-Spectral Face Hallucination via Disentangling Independent Factors

PSGAN: Pose and Expression Robust Spatial-Aware GAN for Customizable Makeup Transfer

GP-NAS: Gaussian Process Based Neural Architecture Search

Information Bottleneck Disentanglement for Identity Swapping

ReMix: Towards Image-to-Image Translation With Limited Data

Pareidolia Face Reenactment

Memory Oriented Transfer Learning for Semi-Supervised Image Deraining

FaceInpainter: High Fidelity Face Adaptation to Heterogeneous Domains

DINE: Domain Adaptation From Single and Multiple Black-Box Predictors

Improving Subgraph Recognition With Variational Graph Information Bottleneck

Rethinking Image Cropping: Exploring Diverse Compositions From Global Views

Few-Shot Backdoor Defense Using Shapley Estimation

Mind the Label Shift of Augmentation-Based Graph OOD Generalization

Beyond Face Rotation: Global and Local Perception GAN for Photorealistic and Identity Preserving Frontal View Synthesis

Make a Face: Towards Arbitrary High Fidelity Face Manipulation

M2FPA: A Multi-Yaw Multi-Pitch High-Quality Dataset and Benchmark for Facial Pose Analysis

Invisible Backdoor Attack With Sample-Specific Triggers

CM-NAS: Cross-Modality Neural Architecture Search for Visible-Infrared Person Re-Identification

TALL: Thumbnail Layout for Deepfake Video Detection

Pluralistic Aging Diffusion Autoencoder

Hierarchical Face Aging through Disentangled Latent Characteristics

A Balanced and Uncertainty-aware Approach for Partial Domain Adaptation

TF-NAS: Rethinking Three Search Freedoms of Latency-Constrained Differentiable Neural Architecture Search

Informative Sample Mining Network for Multi-Domain Image-to-Image Translation

Wavelet-SRNet: A Wavelet-Based CNN for Multi-Scale Face Super Resolution

Do We Really Need Curated Malicious Data for Safety Alignment in Multi-modal Large Language Models?

Cooperative Pseudo Labeling for Unsupervised Federated Classification

Towards Robust Defense against Customization via Protective Perturbation Resistant to Diffusion-based Purification

Rectifying Magnitude Neglect in Linear Attention

Exploring Vacant Classes in Label-Skewed Federated Learning

Protecting Model Adaptation from Trojans in the Unlabeled Data

Heterogeneous Test-Time Training for Multi-Modal Person Re-identification

IntroVAE: Introspective Variational Autoencoders for Photographic Image Synthesis

Learning a High Fidelity Pose Invariant Model for High-resolution Face Frontalization

Dual Variational Generation for Low Shot Heterogeneous Face Recognition

AOT: Appearance Optimal Transport Based Identity Swapping for Forgery Detection

Orthogonal Transformer: An Efficient Vision Transformer Backbone with Token Orthogonalization

Are You Stealing My Model? Sample Correlation for Fingerprinting Deep Neural Networks

Lightweight Vision Transformer with Bidirectional Interaction

Learning-to-Rank Meets Language: Boosting Language-Driven Ordering Alignment for Ordinal Classification