Rui Zhao

21
Papers
227
Total Citations

Papers (21)

VideoSwap: Customized Video Subject Swapping with Interactive Semantic Point Correspondence

CVPR 2024
63
citations

GoT: Unleashing Reasoning Capability of MLLM for Visual Generation and Editing

NeurIPS 2025
60
citations

Sparse Global Matching for Video Frame Interpolation with Large Motion

CVPR 2024
27
citations

Boosting Spike Camera Image Reconstruction from a Perspective of Dealing with Spike Fluctuations

CVPR 2024
26
citations

PUMA: Empowering Unified MLLM with Multi-granular Visual Generation

ICCV 2025
17
citations

Estimating Noisy Class Posterior with Part-level Labels for Noisy Label Learning

CVPR 2024
14
citations

KITS: Inductive Spatio-Temporal Kriging with Increment Training Strategy

AAAI 2025
13
citations

DoraCycle: Domain-Oriented Adaptation of Unified Generative Model in Multimodal Cycles

CVPR 2025
4
citations

Re-Aligning Language to Visual Objects with an Agentic Workflow

ICLR 2025
3
citations

X-Adapter: Adding Universal Compatibility of Plugins for Upgraded Diffusion Model

CVPR 2024
0
citations

Instruct-ReID: A Multi-purpose Person Re-identification Task with Instructions

CVPR 2024
0
citations

Sequential Asynchronous Action Coordination in Multi-Agent Systems: A Stackelberg Decision Transformer Approach

ICML 2024
0
citations

Gradient-based Visual Explanation for Transformer-based CLIP

ICML 2024
0
citations

ISP2HRNet: Learning to Reconstruct High Resolution Image from Irregularly Sampled Pixels via Hierarchical Gradient Learning

ICCV 2025
0
citations

SAMPLE: Semantic Alignment through Temporal-Adaptive Multimodal Prompt Learning for Event-Based Open-Vocabulary Action Recognition

ICCV 2025
0
citations

CoC-VLA: Delving into Adversarial Domain Transfer for Explainable Autonomous Driving via Chain-of-Causality Visual-Language-Action Model

NeurIPS 2025arXiv
0
citations

RemDet: Rethinking Efficient Model Design for UAV Object Detection

AAAI 2025
0
citations

TimeCMA: Towards LLM-Empowered Multivariate Time Series Forecasting via Cross-Modality Alignment

AAAI 2025
0
citations

Conditional Variational Autoencoder for Sign Language Translation with Cross-Modal Alignment

AAAI 2024
0
citations

DynVideo-E: Harnessing Dynamic NeRF for Large-Scale Motion- and View-Change Human-Centric Video Editing

CVPR 2024
0
citations

Self-Supervised Representation Learning from Arbitrary Scenarios

CVPR 2024
0
citations