Shiwei Zhang
32
Papers
247
Total Citations
Papers (32)
InstructVideo: Instructing Video Diffusion Models with Human Feedback
CVPR 2024
80
citations
Animate-X: Universal Character Image Animation with Enhanced Motion Representation
ICLR 2025
59
citations
Hierarchical Spatio-temporal Decoupling for Text-to-Video Generation
CVPR 2024
55
citations
A Recipe for Scaling up Text-to-Video Generation with Text-free Videos
CVPR 2024
53
citations
FreeScale: Unleashing the Resolution of Diffusion Models via Tuning-Free Scale Fusion
ICCV 2025
0
citations
Enhancing Zero-shot Object Counting via Text-guided Local Ranking and Number-evoked Global Attention
ICCV 2025
0
citations
CountSE: Soft Exemplar Open-set Object Counting
ICCV 2025
0
citations
FreeMask: Rethinking the Importance of Attention Masks for Zero-Shot Video Editing
AAAI 2025
0
citations
Check Locate Rectify: A Training-Free Layout Calibration System for Text-to-Image Generation
CVPR 2024
0
citations
DreamVideo: Composing Your Dream Videos with Customized Subject and Motion
CVPR 2024
0
citations
TACNet: Transition-Aware Context Network for Spatio-Temporal Action Detection
CVPR 2019
0
citations
Self-Supervised Learning for Semi-Supervised Temporal Action Proposal
CVPR 2021arXiv
0
citations
Self-Supervised Motion Learning From Static Images
CVPR 2021arXiv
0
citations
TCTrack: Temporal Contexts for Aerial Tracking
CVPR 2022arXiv
0
citations
Learning From Untrimmed Videos: Self-Supervised Video Representation Learning With Hierarchical Consistency
CVPR 2022arXiv
0
citations
Hybrid Relation Guided Set Matching for Few-Shot Action Recognition
CVPR 2022arXiv
0
citations
MoLo: Motion-Augmented Long-Short Contrastive Learning for Few-Shot Action Recognition
CVPR 2023arXiv
0
citations
Enlarging Instance-Specific and Class-Specific Information for Open-Set Action Recognition
CVPR 2023arXiv
0
citations
LipFormer: High-Fidelity and Generalizable Talking Face Generation With a Pre-Learned Facial Codebook
CVPR 2023
0
citations
Support-Set Based Cross-Supervision for Video Grounding
ICCV 2021arXiv
0
citations
OadTR: Online Action Detection With Transformers
ICCV 2021arXiv
0
citations
Space-time Prompting for Video Class-incremental Learning
ICCV 2023
0
citations
Disentangling Spatial and Temporal Learning for Efficient Image-to-Video Transfer Learning
ICCV 2023arXiv
0
citations
RLIPv2: Fast Scaling of Relational Language-Image Pre-Training
ICCV 2023arXiv
0
citations
Open-World Semantic Segmentation for LIDAR Point Clouds
ECCV 2022
0
citations
Timestep Embedding Tells: It's Time to Cache for Video Diffusion Model
CVPR 2025
0
citations
PersonalVideo: High ID-Fidelity Video Customization without Dynamic and Semantic Degradation
ICCV 2025
0
citations
SimpleVQA: Multimodal Factuality Evaluation for Multimodal Large Language Models
ICCV 2025
0
citations
DreamRelation: Relation-Centric Video Customization
ICCV 2025
0
citations
Learning a Condensed Frame for Memory-Efficient Video Class-Incremental Learning
NeurIPS 2022
0
citations
VideoComposer: Compositional Video Synthesis with Motion Controllability
NeurIPS 2023
0
citations
FaceComposer: A Unified Model for Versatile Facial Content Creation
NeurIPS 2023
0
citations