🧬Generative Models

Video Generation

Generating video content from various inputs

100 papers7,841 total citations
Compare with other topics
Mar '24 Feb '26590 papers
Also includes: video generation, video synthesis, text-to-video, video diffusion

Top Papers

#1

CogVideoX: Text-to-Video Diffusion Models with An Expert Transformer

Zhuoyi Yang, Jiayan Teng, Wendi Zheng et al.

ICLR 2025arXiv:2408.06072
text-to-video generationdiffusion transformer3d variational autoencoderexpert transformer+4
1,318
citations
#2

VBench: Comprehensive Benchmark Suite for Video Generative Models

Ziqi Huang, Yinan He, Jiashuo Yu et al.

CVPR 2024
996
citations
#3

ControlVideo: Training-free Controllable Text-to-video Generation

Yabo Zhang, Yuxiang Wei, Dongsheng jiang et al.

ICLR 2024
331
citations
#4

SV3D: Novel Multi-view Synthesis and 3D Generation from a Single Image using Latent Video Diffusion

Vikram Voleti, Chun-Han Yao, Mark Boss et al.

ECCV 2024arXiv:2403.12008
315
citations
#5

Video-P2P: Video Editing with Cross-attention Control

Shaoteng Liu, Yuechen Zhang, Wenbo Li et al.

CVPR 2024
309
citations
#6

Photorealistic Video Generation with Diffusion Models

Agrim Gupta, Lijun Yu, Kihyuk Sohn et al.

ECCV 2024arXiv:2312.06662
diffusion modelsvideo generationtransformer architecturelatent space compression+3
264
citations
#7

EvalCrafter: Benchmarking and Evaluating Large Video Generation Models

Yaofang Liu, Xiaodong Cun, Xuebo Liu et al.

CVPR 2024
237
citations
#8

EMO: Emote Portrait Alive - Generating Expressive Portrait Videos with Audio2Video Diffusion Model under Weak Conditions

Linrui Tian, Qi Wang, Bang Zhang et al.

ECCV 2024
218
citations
#9

SEINE: Short-to-Long Video Diffusion Model for Generative Transition and Prediction

Xinyuan Chen, Yaohui Wang, Lingjun Zhang et al.

ICLR 2024
209
citations
#10

StreamingT2V: Consistent, Dynamic, and Extendable Long Video Generation from Text

Roberto Henschel, Levon Khachatryan, Hayk Poghosyan et al.

CVPR 2025arXiv:2403.14773
154
citations
#11

VideoBooth: Diffusion-based Video Generation with Image Prompts

Yuming Jiang, Tianxing Wu, Shuai Yang et al.

CVPR 2024
118
citations
#12

Improving Video Generation with Human Feedback

Jie Liu, Gongye Liu, Jiajun Liang et al.

NeurIPS 2025arXiv:2501.13918
106
citations
#13

DimensionX: Create Any 3D and 4D Scenes from a Single Image with Decoupled Video Diffusion

Wenqiang Sun, Shuo Chen, Fangfu Liu et al.

ICCV 2025
103
citations
#14

VideoPhy: Evaluating Physical Commonsense for Video Generation

Hritik Bansal, Zongyu Lin, Tianyi Xie et al.

ICLR 2025arXiv:2406.03520
99
citations
#15

VidToMe: Video Token Merging for Zero-Shot Video Editing

Xirui Li, Chao Ma, Xiaokang Yang et al.

CVPR 2024
89
citations
#16

Loopy: Taming Audio-Driven Portrait Avatar with Long-Term Motion Dependency

Jianwen Jiang, Chao Liang, Jiaqi Yang et al.

ICLR 2025
89
citations
#17

Video ReCap: Recursive Captioning of Hour-Long Videos

Md Mohaiminul Islam, Vu Bao Ngan Ho, Xitong Yang et al.

CVPR 2024
82
citations
#18

General Object Foundation Model for Images and Videos at Scale

Junfeng Wu, Yi Jiang, Qihao Liu et al.

CVPR 2024
79
citations
#19

Real-Time Video Generation with Pyramid Attention Broadcast

Xuanlei Zhao, Xiaolong Jin, Kai Wang et al.

ICLR 2025
79
citations
#20

EMAGE: Towards Unified Holistic Co-Speech Gesture Generation via Expressive Masked Audio Gesture Modeling

Haiyang Liu, Zihao Zhu, Giorgio Becherini et al.

CVPR 2024
78
citations
#21

CCEdit: Creative and Controllable Video Editing via Diffusion Models

Ruoyu Feng, Wenming Weng, Yanhui Wang et al.

CVPR 2024arXiv:2309.16496
77
citations
#22

Towards World Simulator: Crafting Physical Commonsense-Based Benchmark for Video Generation

Fanqing Meng, Jiaqi Liao, Xinyu Tan et al.

ICML 2025
72
citations
#23

MV-Adapter: Multi-View Consistent Image Generation Made Easy

Zehuan Huang, Yuan-Chen Guo, Haoran Wang et al.

ICCV 2025arXiv:2412.03632
multi-view image generationadapter-based methodstext-to-image models3d geometric knowledge+3
69
citations
#24

History-Guided Video Diffusion

Kiwhan Song, Boyuan Chen, Max Simchowitz et al.

ICML 2025
66
citations
#25

One-Minute Video Generation with Test-Time Training

Jiarui Xu, Shihao Han, Karan Dalal et al.

CVPR 2025arXiv:2504.05298
video generationtest-time traininglong-context modelingtransformer architecture+3
65
citations
#26

GameFactory: Creating New Games with Generative Interactive Videos

Jiwen Yu, Yiran Qin, Xintao Wang et al.

ICCV 2025arXiv:2501.08325
63
citations
#27

TC4D: Trajectory-Conditioned Text-to-4D Generation

Sherwin Bahmani, Xian Liu, Wang Yifan et al.

ECCV 2024arXiv:2403.17920
text-to-4d generationtrajectory-conditioned generationdynamic 3d scenesneural representations+4
63
citations
#28

Koala: Key Frame-Conditioned Long Video-LLM

Reuben Tan, Ximeng Sun, Ping Hu et al.

CVPR 2024
62
citations
#29

PEEKABOO: Interactive Video Generation via Masked-Diffusion

Yash Jain, Anshul Nasery, Vibhav Vineet et al.

CVPR 2024
61
citations
#30

Long Context Tuning for Video Generation

Yuwei Guo, Ceyuan Yang, Ziyan Yang et al.

ICCV 2025arXiv:2503.10589
56
citations
#31

Self-Distilled Masked Auto-Encoders are Efficient Video Anomaly Detectors

Nicolae Ristea, Florinel Croitoru, Radu Tudor Ionescu et al.

CVPR 2024
55
citations
#32

SynCamMaster: Synchronizing Multi-Camera Video Generation from Diverse Viewpoints

Jianhong Bai, Menghan Xia, Xintao WANG et al.

ICLR 2025
55
citations
#33

Frame Context Packing and Drift Prevention in Next-Frame-Prediction Video Diffusion Models

Lvmin Zhang, Shengqu Cai, Muyang Li et al.

NeurIPS 2025arXiv:2504.12626
video diffusion modelsnext-frame predictionframe context packingdrift prevention+4
55
citations
#34

Hierarchical Spatio-temporal Decoupling for Text-to-Video Generation

Zhiwu Qing, Shiwei Zhang, Jiayu Wang et al.

CVPR 2024
55
citations
#35

A Recipe for Scaling up Text-to-Video Generation with Text-free Videos

Xiang Wang, Shiwei Zhang, Hangjie Yuan et al.

CVPR 2024
53
citations
#36

OMG: Towards Open-vocabulary Motion Generation via Mixture of Controllers

Han Liang, Jiacheng Bao, Ruichi Zhang et al.

CVPR 2024arXiv:2312.08985
47
citations
#37

Image Conductor: Precision Control for Interactive Video Synthesis

Yaowei Li, Xintao Wang, Zhaoyang Zhang et al.

AAAI 2025arXiv:2406.15339
46
citations
#38

GAIA: Zero-shot Talking Avatar Generation

Tianyu He, Junliang Guo, Runyi Yu et al.

ICLR 2024
46
citations
#39

ComboVerse: Compositional 3D Assets Creation Using Spatially-Aware Diffusion Guidance

Yongwei Chen, Tengfei Wang, Tong Wu et al.

ECCV 2024arXiv:2403.12409
3d asset generationsingle-image 3d generationspatially-aware diffusion guidancescore distillation sampling+4
44
citations
#40

Style2Talker: High-Resolution Talking Head Generation with Emotion Style and Art Style

Shuai Tan, Bin Ji, Ye Pan

AAAI 2024arXiv:2403.06365
talking head generationemotion style transferart style transferaudio-driven animation+4
43
citations
#41

OpenHumanVid: A Large-Scale High-Quality Dataset for Enhancing Human-Centric Video Generation

Hui Li, Mingwang Xu, Qingkun Su et al.

CVPR 2025
40
citations
#42

Generalizing Deepfake Video Detection with Plug-and-Play: Video-Level Blending and Spatiotemporal Adapter Tuning

Zhiyuan Yan, Yandan Zhao, Shen Chen et al.

CVPR 2025arXiv:2408.17065
39
citations
#43

A Simple Recipe for Contrastively Pre-training Video-First Encoders Beyond 16 Frames

Pinelopi Papalampidi, Skanda Koppula, Shreya Pathak et al.

CVPR 2024
39
citations
#44

SG-I2V: Self-Guided Trajectory Control in Image-to-Video Generation

Koichi Namekata, Sherwin Bahmani, Ziyi Wu et al.

ICLR 2025arXiv:2411.04989
39
citations
#45

Trajectory attention for fine-grained video motion control

Zeqi Xiao, Wenqi Ouyang, Yifan Zhou et al.

ICLR 2025arXiv:2411.19324
video diffusion modelscamera motion controltrajectory attentiontemporal attention+3
38
citations
#46

CoCoCo: Improving Text-Guided Video Inpainting for Better Consistency, Controllability and Compatibility

Bojia Zi, Shihao Zhao, Xianbiao Qi et al.

AAAI 2025arXiv:2403.12035
38
citations
#47

Autoregressive Adversarial Post-Training for Real-Time Interactive Video Generation

Shanchuan Lin, Ceyuan Yang, Hao He et al.

NeurIPS 2025
37
citations
#48

DragVideo: Interactive Drag-style Video Editing

Yufan Deng, Ruida Wang, Yuhao ZHANG et al.

ECCV 2024arXiv:2312.02216
36
citations
#49

FreeMotion: A Unified Framework for Number-free Text-to-Motion Synthesis

Ke Fan, Junshu Tang, Weijian Cao et al.

ECCV 2024arXiv:2405.15763
text-to-motion synthesismulti-person motion generationconditional motion distributionmotion spatial control+1
34
citations
#50

EnerVerse: Envisioning Embodied Future Space for Robotics Manipulation

Siyuan Huang, Liliang Chen, Pengfei Zhou et al.

NeurIPS 2025arXiv:2501.01895
embodied space generationvideo diffusion frameworkmulti-view video representation4d gaussian splatting+4
34
citations
#51

FreeVS: Generative View Synthesis on Free Driving Trajectory

Qitai Wang, Lue Fan, Yuqi Wang et al.

ICLR 2025arXiv:2410.18079
34
citations
#52

VidMuse: A Simple Video-to-Music Generation Framework with Long-Short-Term Modeling

Zeyue Tian, Zhaoyang Liu, Ruibin Yuan et al.

CVPR 2025arXiv:2406.04321
31
citations
#53

Training-free and Adaptive Sparse Attention for Efficient Long Video Generation

yifei xia, Suhan Ling, Fangcheng Fu et al.

ICCV 2025arXiv:2502.21079
30
citations
#54

Masked Generative Video-to-Audio Transformers with Enhanced Synchronicity

Santiago Pascual, Chunghsin YEH, Ioannis Tsiamas et al.

ECCV 2024arXiv:2407.10387
video-to-audio generationaudio-visual synchronizationgenerative audio codecmasked generative model+2
30
citations
#55

PREGO: Online Mistake Detection in PRocedural EGOcentric Videos

Alessandro Flaborea, Guido M. D&amp, #x27 et al.

CVPR 2024arXiv:2404.01933
30
citations
#56

Generative Inbetweening: Adapting Image-to-Video Models for Keyframe Interpolation

Xiaojuan Wang, Boyang Zhou, Brian Curless et al.

ICLR 2025arXiv:2408.15239
29
citations
#57

OmniViD: A Generative Framework for Universal Video Understanding

Junke Wang, Dongdong Chen, Chong Luo et al.

CVPR 2024
29
citations
#58

ARLON: Boosting Diffusion Transformers with Autoregressive Models for Long Video Generation

Zongyi Li, Shujie HU, Shujie LIU et al.

ICLR 2025arXiv:2410.20502
diffusion transformerslong video generationautoregressive modelslatent vq-vae+4
27
citations
#59

MoDGS: Dynamic Gaussian Splatting from Casually-captured Monocular Videos with Depth Priors

Qingming LIU, Yuan Liu, Jiepeng Wang et al.

ICLR 2025arXiv:2406.00434
26
citations
#60

Generative Rendering: Controllable 4D-Guided Video Generation with 2D Diffusion Models

Shengqu Cai, Duygu Ceylan, Matheus Gadelha et al.

CVPR 2024
26
citations
#61

OpenS2V-Nexus: A Detailed Benchmark and Million-Scale Dataset for Subject-to-Video Generation

Shenghai Yuan, Xianyi He, Yufan Deng et al.

NeurIPS 2025arXiv:2505.20292
25
citations
#62

FineVQ: Fine-Grained User Generated Content Video Quality Assessment

Huiyu Duan, Qiang Hu, Wang Jiarui et al.

CVPR 2025arXiv:2412.19238
25
citations
#63

AnimateAnything: Consistent and Controllable Animation for Video Generation

guojun lei, Chi Wang, Rong Zhang et al.

CVPR 2025arXiv:2411.10836
controllable video generationoptical flow guidancemotion representationtemporal coherence+2
24
citations
#64

AIGV-Assessor: Benchmarking and Evaluating the Perceptual Quality of Text-to-Video Generation with LMM

Wang Jiarui, Huiyu Duan, Guangtao Zhai et al.

CVPR 2025
24
citations
#65

Videoshop: Localized Semantic Video Editing with Noise-Extrapolated Diffusion Inversion

Xiang Fan, Anand Bhattad, Ranjay Krishna

ECCV 2024arXiv:2403.14617
23
citations
#66

V2Meow: Meowing to the Visual Beat via Video-to-Music Generation

Kun Su, Judith Li, Qingqing Huang et al.

AAAI 2024arXiv:2305.06594
video-to-music generationautoregressive modelvisual-audio correspondenceaudio codecs+4
23
citations
#67

OSV: One Step is Enough for High-Quality Image to Video Generation

Xiaofeng Mao, Zhengkai Jiang, Fu-Yun Wang et al.

CVPR 2025arXiv:2409.11367
22
citations
#68

ElasticTok: Adaptive Tokenization for Image and Video

Wilson Yan, Volodymyr Mnih, Aleksandra Faust et al.

ICLR 2025arXiv:2410.08368
21
citations
#69

MagicMirror: ID-Preserved Video Generation in Video Diffusion Transformers

Yuechen Zhang, YaoYang Liu, Bin Xia et al.

ICCV 2025arXiv:2501.03931
21
citations
#70

VideoMAC: Video Masked Autoencoders Meet ConvNets

Gensheng Pei, Tao Chen, Xiruo Jiang et al.

CVPR 2024
20
citations
#71

Generative Video Propagation

Shaoteng Liu, Tianyu Wang, Jui-Hsien Wang et al.

CVPR 2025
20
citations
#72

Towards Open Domain Text-Driven Synthesis of Multi-Person Motions

Shan Mengyi, Lu Dong, Yutao Han et al.

ECCV 2024
20
citations
#73

VMem: Consistent Interactive Video Scene Generation with Surfel-Indexed View Memory

Runjia Li, Philip Torr, Andrea Vedaldi et al.

ICCV 2025arXiv:2506.18903
20
citations
#74

Grid Diffusion Models for Text-to-Video Generation

Taegyeong Lee, Soyeong Kwon, Taehwan Kim

CVPR 2024
20
citations
#75

SnapGen-V: Generating a Five-Second Video within Five Seconds on a Mobile Device

Yushu Wu, Zhixing Zhang, Yanyu Li et al.

CVPR 2025
20
citations
#76

STIV: Scalable Text and Image Conditioned Video Generation

Zongyu Lin, Wei Liu, Chen Chen et al.

ICCV 2025
20
citations
#77

DreamComposer: Controllable 3D Object Generation via Multi-View Conditions

Yunhan Yang, Yukun Huang, Xiaoyang Wu et al.

CVPR 2024
19
citations
#78

Taming Teacher Forcing for Masked Autoregressive Video Generation

Deyu Zhou, Quan Sun, Yuang Peng et al.

CVPR 2025arXiv:2501.12389
19
citations
#79

InstructAvatar: Text-Guided Emotion and Motion Control for Avatar Generation

Yuchi Wang, Junliang Guo, Jianhong Bai et al.

AAAI 2025
19
citations
#80

TASTE-Rob: Advancing Video Generation of Task-Oriented Hand-Object Interaction for Generalizable Robotic Manipulation

Hongxiang Zhao, Xingchen Liu, Mutian Xu et al.

CVPR 2025arXiv:2503.11423
19
citations
#81

AMEGO: Active Memory from long EGOcentric videos

Gabriele Goletto, Tushar Nagarajan, Giuseppe Averta et al.

ECCV 2024arXiv:2409.10917
19
citations
#82

4Real-Video: Learning Generalizable Photo-Realistic 4D Video Diffusion

Chaoyang Wang, Peiye Zhuang, Tuan Duc Ngo et al.

CVPR 2025
18
citations
#83

MagDiff: Multi-Alignment Diffusion for High-Fidelity Video Generation and Editing

Haoyu Zhao, Tianyi Lu, Jiaxi Gu et al.

ECCV 2024
18
citations
#84

Vid2Avatar-Pro: Authentic Avatar from Videos in the Wild via Universal Prior

Chen Guo, Junxuan Li, Yash Kant et al.

CVPR 2025
18
citations
#85

MoST: Motion Style Transformer Between Diverse Action Contents

Boeun Kim, Jungho Kim, Hyung Jin Chang et al.

CVPR 2024
18
citations
#86

PUMA: Empowering Unified MLLM with Multi-granular Visual Generation

Rongyao Fang, Chengqi Duan, Kun Wang et al.

ICCV 2025
17
citations
#87

Unleashing the Potential of Multi-modal Foundation Models and Video Diffusion for 4D Dynamic Physical Scene Simulation

Zhuoman Liu, Weicai Ye, Yan Luximon et al.

CVPR 2025
17
citations
#88

MagicMotion: Controllable Video Generation with Dense-to-Sparse Trajectory Guidance

Quanhao Li, Zhen Xing, Rui Wang et al.

ICCV 2025arXiv:2503.16421
17
citations
#89

SlowFast-VGen: Slow-Fast Learning for Action-Driven Long Video Generation

Yining Hong, Beide Liu, Maxine Wu et al.

ICLR 2025arXiv:2410.23277
video generationdiffusion modelslong video generationaction-driven generation+4
17
citations
#90

SuperGaussian: Repurposing Video Models for 3D Super Resolution

Yuan Shen, Duygu Ceylan, Paul Guerrero et al.

ECCV 2024arXiv:2406.00609
16
citations
#91

Programmable Motion Generation for Open-Set Motion Control Tasks

Hanchao Liu, Xiaohang Zhan, Shaoli Huang et al.

CVPR 2024arXiv:2405.19283
16
citations
#92

Video Prediction by Modeling Videos as Continuous Multi-Dimensional Processes

Gaurav Shrivastava, Abhinav Shrivastava

CVPR 2024
16
citations
#93

GenFusion: Closing the Loop between Reconstruction and Generation via Videos

Sibo Wu, Congrong Xu, Binbin Huang et al.

CVPR 2025arXiv:2503.21219
16
citations
#94

Video-T1: Test-time Scaling for Video Generation

Fangfu Liu, Hanyang Wang, Yimo Cai et al.

ICCV 2025arXiv:2503.18942
15
citations
#95

VMBench: A Benchmark for Perception-Aligned Video Motion Generation

Xinran Ling, Chen Zhu, Meiqi Wu et al.

ICCV 2025
15
citations
#96

Training-Free Efficient Video Generation via Dynamic Token Carving

Yuechen Zhang, Jinbo Xing, bin xia et al.

NeurIPS 2025
15
citations
#97

OmniMotionGPT: Animal Motion Generation with Limited Data

Zhangsihao Yang, Mingyuan Zhou, Mengyi Shan et al.

CVPR 2024
15
citations
#98

Video Diffusion Models Are Strong Video Inpainter

Minhyeok Lee, Suhwan Cho, Chajin Shin et al.

AAAI 2025
14
citations
#99

Dynamic-VLM: Simple Dynamic Visual Token Compression for VideoLLM

Han Wang, Yuxiang Nie, Yongjie Ye et al.

ICCV 2025arXiv:2412.09530
14
citations
#100

MoVideo: Motion-Aware Video Generation with Diffusion Models

Jingyun Liang, Yuchen Fan, Kai Zhang et al.

ECCV 2024arXiv:2311.11325
diffusion modelsvideo generationoptical flow guidancevideo depth estimation+4
14
citations