Mike Zheng Shou

29
Papers
852
Total Citations

Papers (29)

MagicAnimate: Temporally Consistent Human Image Animation using Diffusion Model

CVPR 2024
318
citations

ShowUI: One Vision-Language-Action Model for GUI Visual Agent

CVPR 2025
123
citations

VideoLLM-online: Online Video Large Language Model for Streaming Video

CVPR 2024
109
citations

Show-o2: Improved Native Unified Multimodal Models

NeurIPS 2025
90
citations

VideoSwap: Customized Video Subject Swapping with Interactive Semantic Point Correspondence

CVPR 2024
63
citations

DIFIX3D+: Improving 3D Reconstructions with Single-Step Diffusion Models

CVPR 2025
59
citations

LayerTracer: Cognitive-Aligned Layered SVG Synthesis via Diffusion Transformer

ICCV 2025
26
citations

AssistGUI: Task-Oriented PC Graphical User Interface Automation

CVPR 2024
18
citations

IDProtector: An Adversarial Noise Encoder to Protect Against ID-Preserving Image Generation

CVPR 2025
14
citations

Impossible Videos

ICML 2025
7
citations

ROICtrl: Boosting Instance Control for Visual Generation

CVPR 2025
7
citations

DiffSim: Taming Diffusion Models for Evaluating Visual Similarity

ICCV 2025
7
citations

DoraCycle: Domain-Oriented Adaptation of Unified Generative Model in Multimodal Cycles

CVPR 2025
4
citations

LiveCC: Learning Video LLM with Streaming Speech Transcription at Scale

CVPR 2025
4
citations

SAM-I2V: Upgrading SAM to Support Promptable Video Segmentation with Less than 0.2% Training Cost

CVPR 2025
3
citations

ReCapture: Generative Video Camera Controls for User-Provided Videos using Masked Video Fine-Tuning

CVPR 2025
0
citations

L4D-Track: Language-to-4D Modeling Towards 6-DoF Tracking and Shape Reconstruction in 3D Point Cloud Stream

CVPR 2024
0
citations

DynVideo-E: Harnessing Dynamic NeRF for Large-Scale Motion- and View-Change Human-Centric Video Editing

CVPR 2024
0
citations

Tune-An-Ellipse: CLIP Has Potential to Find What You Want

CVPR 2024
0
citations

Balanced Image Stylization with Style Matching Score

ICCV 2025
0
citations

X-Adapter: Adding Universal Compatibility of Plugins for Upgraded Diffusion Model

CVPR 2024
0
citations

Rethinking the Objectives of Vector-Quantized Tokenizers for Image Synthesis

CVPR 2024
0
citations

Bootstrapping SparseFormers from Vision Foundation Models

CVPR 2024
0
citations

VLog: Video-Language Models by Generative Retrieval of Narration Vocabulary

CVPR 2025
0
citations

Ego-Exo4D: Understanding Skilled Human Activity from First- and Third-Person Perspectives

CVPR 2024
0
citations

ViT-Lens: Towards Omni-modal Representations

CVPR 2024
0
citations

MovieBench: A Hierarchical Movie Level Dataset for Long Video Generation

CVPR 2025
0
citations

Factorized Learning for Temporally Grounded Video-Language Models

ICCV 2025
0
citations

VG-TVP: Multimodal Procedural Planning via Visually Grounded Text-Video Prompting

AAAI 2025
0
citations