Rui Zhao

71

Papers

227

Total Citations

Papers (71)

VideoSwap: Customized Video Subject Swapping with Interactive Semantic Point Correspondence

GoT: Unleashing Reasoning Capability of MLLM for Visual Generation and Editing

Sparse Global Matching for Video Frame Interpolation with Large Motion

Boosting Spike Camera Image Reconstruction from a Perspective of Dealing with Spike Fluctuations

PUMA: Empowering Unified MLLM with Multi-granular Visual Generation

Estimating Noisy Class Posterior with Part-level Labels for Noisy Label Learning

KITS: Inductive Spatio-Temporal Kriging with Increment Training Strategy

DoraCycle: Domain-Oriented Adaptation of Unified Generative Model in Multimodal Cycles

Re-Aligning Language to Visual Objects with an Agentic Workflow

X-Adapter: Adding Universal Compatibility of Plugins for Upgraded Diffusion Model

Instruct-ReID: A Multi-purpose Person Re-identification Task with Instructions

Sequential Asynchronous Action Coordination in Multi-Agent Systems: A Stackelberg Decision Transformer Approach

Gradient-based Visual Explanation for Transformer-based CLIP

Saliency Detection by Multi-Context Deep Learning

Facial Expression Intensity Estimation Using Ordinal Information

A Hierarchical Generative Model for Eye Image Synthesis and Eye Gaze Estimation

Attention-Aware Compositional Network for Person Re-Identification

Bilateral Ordinal Relevance Multi-Instance Regression for Facial Action Unit Intensity Estimation

Bayesian Hierarchical Dynamic Model for Human Action Recognition

P2SGrad: Refined Gradients for Optimizing Deep Face Models

AdaCos: Adaptively Scaling Cosine Logits for Effectively Learning Deep Face Representations

Generalizing Eye Tracking With Bayesian Adversarial Learning

COCAS: A Large-Scale Clothes Changing Person Dataset for Re-Identification

Density-Aware Feature Embedding for Face Clustering

Bayesian Adversarial Human Motion Synthesis

Learning to Cluster Faces via Confidence and Connectivity Estimation

Uni6D: A Unified CNN Framework Without Projection Breakdown for 6D Pose Estimation

Optical Flow Estimation for Spiking Camera

Feature Erasing and Diffusion Network for Occluded Person Re-Identification

Align Representations With Base: A New Approach to Self-Supervised Learning

Revisiting the Transferability of Supervised Pretraining: An MLP Perspective

UniVIP: A Unified Framework for Self-Supervised Visual Pre-Training

Semi-Supervised Semantic Segmentation Using Unreliable Pseudo-Labels

CORA: Adapting CLIP for Open-Vocabulary Detection With Region Prompting and Anchor Pre-Matching

Balancing Logit Variation for Long-Tailed Semantic Segmentation

Zero-Shot Text-to-Parameter Translation for Game Character Auto-Creation

HumanBench: Towards General Human-Centric Perception With Projector Assisted Pretraining

Memory-Based Neighbourhood Embedding for Visual Recognition

Bayesian Graph Convolution LSTM for Skeleton Based Action Recognition

Progressive Correspondence Pruning by Consensus Learning

Human Preference Score: Better Aligning Text-to-Image Models with Human Preference

Advancing Referring Expression Segmentation Beyond Single Image

SparseMAE: Sparse Training Meets Masked Autoencoders

Self-supervising Fine-grained Region Similarities for Large-scale Image Localization

RBF-Softmax: Learning Deep Representative Prototypes with Radial Basis Function Softmax

Scale-Aware Spatio-Temporal Relation Learning for Video Anomaly Detection

Counterfactual Intervention Feature Transfer for Visible-Infrared Person Re-identification

Unifying Visual Contrastive Learning for Object Recognition from a Graph Perspective

Relative Contrastive Loss for Unsupervised Representation Learning

Domain Invariant Masked Autoencoders for Self-Supervised Learning from Multi-Domains

UniHCP: A Unified Model for Human-Centric Perceptions

ISP2HRNet: Learning to Reconstruct High Resolution Image from Irregularly Sampled Pixels via Hierarchical Gradient Learning

SAMPLE: Semantic Alignment through Temporal-Adaptive Multimodal Prompt Learning for Event-Based Open-Vocabulary Action Recognition

CoC-VLA: Delving into Adversarial Domain Transfer for Explainable Autonomous Driving via Chain-of-Causality Visual-Language-Action Model

RemDet: Rethinking Efficient Model Design for UAV Object Detection

TimeCMA: Towards LLM-Empowered Multivariate Time Series Forecasting via Cross-Modality Alignment

Conditional Variational Autoencoder for Sign Language Translation with Cross-Modal Alignment

DynVideo-E: Harnessing Dynamic NeRF for Large-Scale Motion- and View-Change Human-Centric Video Editing

Self-Supervised Representation Learning from Arbitrary Scenarios

Self-paced Contrastive Learning with Hybrid Memory for Domain Adaptive Object Re-ID

MST: Masked Self-Supervised Transformer for Visual Representation

Obj2Seq: Formatting Objects as Sequences with Class Prompt for Visual Tasks

Learning from Future: A Novel Self-Training Framework for Semantic Segmentation

Learning Optical Flow from Continuous Spike Streams

Unsupervised Object Detection Pretraining with Joint Object Priors Generation and Detector Learning

Mix-of-Show: Decentralized Low-Rank Adaptation for Multi-Concept Customization of Diffusion Models

Unsupervised Optical Flow Estimation with Dynamic Timing Representation for Spike Camera

DatasetDM: Synthesizing Data with Perception Annotations Using Diffusion Models

MeGraph: Capturing Long-Range Interactions by Alternating Local and Hierarchical Aggregation on Multi-Scaled Graph Hierarchy

Described Object Detection: Liberating Object Detection with Flexible Expressions

Maximum Entropy-Regularized Multi-Goal Reinforcement Learning