Shiwei Zhang

32
Papers
247
Total Citations

Papers (32)

InstructVideo: Instructing Video Diffusion Models with Human Feedback

CVPR 2024
80
citations

Animate-X: Universal Character Image Animation with Enhanced Motion Representation

ICLR 2025
59
citations

Hierarchical Spatio-temporal Decoupling for Text-to-Video Generation

CVPR 2024
55
citations

A Recipe for Scaling up Text-to-Video Generation with Text-free Videos

CVPR 2024
53
citations

FreeScale: Unleashing the Resolution of Diffusion Models via Tuning-Free Scale Fusion

ICCV 2025
0
citations

Enhancing Zero-shot Object Counting via Text-guided Local Ranking and Number-evoked Global Attention

ICCV 2025
0
citations

CountSE: Soft Exemplar Open-set Object Counting

ICCV 2025
0
citations

FreeMask: Rethinking the Importance of Attention Masks for Zero-Shot Video Editing

AAAI 2025
0
citations

Check Locate Rectify: A Training-Free Layout Calibration System for Text-to-Image Generation

CVPR 2024
0
citations

DreamVideo: Composing Your Dream Videos with Customized Subject and Motion

CVPR 2024
0
citations

TACNet: Transition-Aware Context Network for Spatio-Temporal Action Detection

CVPR 2019
0
citations

Self-Supervised Learning for Semi-Supervised Temporal Action Proposal

CVPR 2021arXiv
0
citations

Self-Supervised Motion Learning From Static Images

CVPR 2021arXiv
0
citations

TCTrack: Temporal Contexts for Aerial Tracking

CVPR 2022arXiv
0
citations

Learning From Untrimmed Videos: Self-Supervised Video Representation Learning With Hierarchical Consistency

CVPR 2022arXiv
0
citations

Hybrid Relation Guided Set Matching for Few-Shot Action Recognition

CVPR 2022arXiv
0
citations

MoLo: Motion-Augmented Long-Short Contrastive Learning for Few-Shot Action Recognition

CVPR 2023arXiv
0
citations

Enlarging Instance-Specific and Class-Specific Information for Open-Set Action Recognition

CVPR 2023arXiv
0
citations

LipFormer: High-Fidelity and Generalizable Talking Face Generation With a Pre-Learned Facial Codebook

CVPR 2023
0
citations

Support-Set Based Cross-Supervision for Video Grounding

ICCV 2021arXiv
0
citations

OadTR: Online Action Detection With Transformers

ICCV 2021arXiv
0
citations

Space-time Prompting for Video Class-incremental Learning

ICCV 2023
0
citations

Disentangling Spatial and Temporal Learning for Efficient Image-to-Video Transfer Learning

ICCV 2023arXiv
0
citations

RLIPv2: Fast Scaling of Relational Language-Image Pre-Training

ICCV 2023arXiv
0
citations

Open-World Semantic Segmentation for LIDAR Point Clouds

ECCV 2022
0
citations

Timestep Embedding Tells: It's Time to Cache for Video Diffusion Model

CVPR 2025
0
citations

PersonalVideo: High ID-Fidelity Video Customization without Dynamic and Semantic Degradation

ICCV 2025
0
citations

SimpleVQA: Multimodal Factuality Evaluation for Multimodal Large Language Models

ICCV 2025
0
citations

DreamRelation: Relation-Centric Video Customization

ICCV 2025
0
citations

Learning a Condensed Frame for Memory-Efficient Video Class-Incremental Learning

NeurIPS 2022
0
citations

VideoComposer: Compositional Video Synthesis with Motion Controllability

NeurIPS 2023
0
citations

FaceComposer: A Unified Model for Versatile Facial Content Creation

NeurIPS 2023
0
citations