Rui Zhao
21
Papers
227
Total Citations
Papers (21)
VideoSwap: Customized Video Subject Swapping with Interactive Semantic Point Correspondence
CVPR 2024
63
citations
GoT: Unleashing Reasoning Capability of MLLM for Visual Generation and Editing
NeurIPS 2025
60
citations
Sparse Global Matching for Video Frame Interpolation with Large Motion
CVPR 2024
27
citations
Boosting Spike Camera Image Reconstruction from a Perspective of Dealing with Spike Fluctuations
CVPR 2024
26
citations
PUMA: Empowering Unified MLLM with Multi-granular Visual Generation
ICCV 2025
17
citations
Estimating Noisy Class Posterior with Part-level Labels for Noisy Label Learning
CVPR 2024
14
citations
KITS: Inductive Spatio-Temporal Kriging with Increment Training Strategy
AAAI 2025
13
citations
DoraCycle: Domain-Oriented Adaptation of Unified Generative Model in Multimodal Cycles
CVPR 2025
4
citations
Re-Aligning Language to Visual Objects with an Agentic Workflow
ICLR 2025
3
citations
X-Adapter: Adding Universal Compatibility of Plugins for Upgraded Diffusion Model
CVPR 2024
0
citations
Instruct-ReID: A Multi-purpose Person Re-identification Task with Instructions
CVPR 2024
0
citations
Sequential Asynchronous Action Coordination in Multi-Agent Systems: A Stackelberg Decision Transformer Approach
ICML 2024
0
citations
Gradient-based Visual Explanation for Transformer-based CLIP
ICML 2024
0
citations
ISP2HRNet: Learning to Reconstruct High Resolution Image from Irregularly Sampled Pixels via Hierarchical Gradient Learning
ICCV 2025
0
citations
SAMPLE: Semantic Alignment through Temporal-Adaptive Multimodal Prompt Learning for Event-Based Open-Vocabulary Action Recognition
ICCV 2025
0
citations
CoC-VLA: Delving into Adversarial Domain Transfer for Explainable Autonomous Driving via Chain-of-Causality Visual-Language-Action Model
NeurIPS 2025arXiv
0
citations
RemDet: Rethinking Efficient Model Design for UAV Object Detection
AAAI 2025
0
citations
TimeCMA: Towards LLM-Empowered Multivariate Time Series Forecasting via Cross-Modality Alignment
AAAI 2025
0
citations
Conditional Variational Autoencoder for Sign Language Translation with Cross-Modal Alignment
AAAI 2024
0
citations
DynVideo-E: Harnessing Dynamic NeRF for Large-Scale Motion- and View-Change Human-Centric Video Editing
CVPR 2024
0
citations
Self-Supervised Representation Learning from Arbitrary Scenarios
CVPR 2024
0
citations