🧬Generative Models

Video Generation

Generating video content from various inputs

100 papers7,841 total citations

Compare with other topics

Mar '24 — Feb '26590 papers

Top Conferences

CVPR: 43 ICLR: 16 ECCV: 14 ICCV: 13 NeurIPS: 6 AAAI: 6

Top Papers

#1

CogVideoX: Text-to-Video Diffusion Models with An Expert Transformer

Zhuoyi Yang, Jiayan Teng, Wendi Zheng et al.

ICLR 2025arXiv:2408.06072

text-to-video generationdiffusion transformer3d variational autoencoderexpert transformer+4

1,318

citations

#2

VBench: Comprehensive Benchmark Suite for Video Generative Models

Ziqi Huang, Yinan He, Jiashuo Yu et al.

ControlVideo: Training-free Controllable Text-to-video Generation

Yabo Zhang, Yuxiang Wei, Dongsheng jiang et al.

SV3D: Novel Multi-view Synthesis and 3D Generation from a Single Image using Latent Video Diffusion

Vikram Voleti, Chun-Han Yao, Mark Boss et al.

ECCV 2024arXiv:2403.12008

315

citations

#5

Video-P2P: Video Editing with Cross-attention Control

Shaoteng Liu, Yuechen Zhang, Wenbo Li et al.

Photorealistic Video Generation with Diffusion Models

Agrim Gupta, Lijun Yu, Kihyuk Sohn et al.

ECCV 2024arXiv:2312.06662

diffusion modelsvideo generationtransformer architecturelatent space compression+3

264

citations

#7

EvalCrafter: Benchmarking and Evaluating Large Video Generation Models

Yaofang Liu, Xiaodong Cun, Xuebo Liu et al.

EMO: Emote Portrait Alive - Generating Expressive Portrait Videos with Audio2Video Diffusion Model under Weak Conditions

Linrui Tian, Qi Wang, Bang Zhang et al.

SEINE: Short-to-Long Video Diffusion Model for Generative Transition and Prediction

Xinyuan Chen, Yaohui Wang, Lingjun Zhang et al.

StreamingT2V: Consistent, Dynamic, and Extendable Long Video Generation from Text

Roberto Henschel, Levon Khachatryan, Hayk Poghosyan et al.

CVPR 2025arXiv:2403.14773

154

citations

#11

VideoBooth: Diffusion-based Video Generation with Image Prompts

Yuming Jiang, Tianxing Wu, Shuai Yang et al.

Improving Video Generation with Human Feedback

Jie Liu, Gongye Liu, Jiajun Liang et al.

NeurIPS 2025arXiv:2501.13918

106

citations

#13

DimensionX: Create Any 3D and 4D Scenes from a Single Image with Decoupled Video Diffusion

Wenqiang Sun, Shuo Chen, Fangfu Liu et al.

VideoPhy: Evaluating Physical Commonsense for Video Generation

Hritik Bansal, Zongyu Lin, Tianyi Xie et al.

ICLR 2025arXiv:2406.03520

99

citations

#15

VidToMe: Video Token Merging for Zero-Shot Video Editing

Xirui Li, Chao Ma, Xiaokang Yang et al.

Loopy: Taming Audio-Driven Portrait Avatar with Long-Term Motion Dependency

Jianwen Jiang, Chao Liang, Jiaqi Yang et al.

Video ReCap: Recursive Captioning of Hour-Long Videos

Md Mohaiminul Islam, Vu Bao Ngan Ho, Xitong Yang et al.

General Object Foundation Model for Images and Videos at Scale

Junfeng Wu, Yi Jiang, Qihao Liu et al.

Real-Time Video Generation with Pyramid Attention Broadcast

Xuanlei Zhao, Xiaolong Jin, Kai Wang et al.

EMAGE: Towards Unified Holistic Co-Speech Gesture Generation via Expressive Masked Audio Gesture Modeling

Haiyang Liu, Zihao Zhu, Giorgio Becherini et al.

CCEdit: Creative and Controllable Video Editing via Diffusion Models

Ruoyu Feng, Wenming Weng, Yanhui Wang et al.

CVPR 2024arXiv:2309.16496

77

citations

#22

Towards World Simulator: Crafting Physical Commonsense-Based Benchmark for Video Generation

Fanqing Meng, Jiaqi Liao, Xinyu Tan et al.

MV-Adapter: Multi-View Consistent Image Generation Made Easy

Zehuan Huang, Yuan-Chen Guo, Haoran Wang et al.

ICCV 2025arXiv:2412.03632

multi-view image generationadapter-based methodstext-to-image models3d geometric knowledge+3

69

citations

#24

History-Guided Video Diffusion

Kiwhan Song, Boyuan Chen, Max Simchowitz et al.

One-Minute Video Generation with Test-Time Training

Jiarui Xu, Shihao Han, Karan Dalal et al.

CVPR 2025arXiv:2504.05298

video generationtest-time traininglong-context modelingtransformer architecture+3

65

citations

#26

GameFactory: Creating New Games with Generative Interactive Videos

Jiwen Yu, Yiran Qin, Xintao Wang et al.

ICCV 2025arXiv:2501.08325

63

citations

#27

TC4D: Trajectory-Conditioned Text-to-4D Generation

Sherwin Bahmani, Xian Liu, Wang Yifan et al.

ECCV 2024arXiv:2403.17920

text-to-4d generationtrajectory-conditioned generationdynamic 3d scenesneural representations+4

63

citations

#28

Koala: Key Frame-Conditioned Long Video-LLM

Reuben Tan, Ximeng Sun, Ping Hu et al.

PEEKABOO: Interactive Video Generation via Masked-Diffusion

Yash Jain, Anshul Nasery, Vibhav Vineet et al.

Long Context Tuning for Video Generation

Yuwei Guo, Ceyuan Yang, Ziyan Yang et al.

ICCV 2025arXiv:2503.10589

56

citations

#31

Self-Distilled Masked Auto-Encoders are Efficient Video Anomaly Detectors

Nicolae Ristea, Florinel Croitoru, Radu Tudor Ionescu et al.

SynCamMaster: Synchronizing Multi-Camera Video Generation from Diverse Viewpoints

Jianhong Bai, Menghan Xia, Xintao WANG et al.

Frame Context Packing and Drift Prevention in Next-Frame-Prediction Video Diffusion Models

Lvmin Zhang, Shengqu Cai, Muyang Li et al.

NeurIPS 2025arXiv:2504.12626

video diffusion modelsnext-frame predictionframe context packingdrift prevention+4

55

citations

#34

Hierarchical Spatio-temporal Decoupling for Text-to-Video Generation

Zhiwu Qing, Shiwei Zhang, Jiayu Wang et al.

A Recipe for Scaling up Text-to-Video Generation with Text-free Videos

Xiang Wang, Shiwei Zhang, Hangjie Yuan et al.

OMG: Towards Open-vocabulary Motion Generation via Mixture of Controllers

Han Liang, Jiacheng Bao, Ruichi Zhang et al.

CVPR 2024arXiv:2312.08985

47

citations

#37

Image Conductor: Precision Control for Interactive Video Synthesis

Yaowei Li, Xintao Wang, Zhaoyang Zhang et al.

AAAI 2025arXiv:2406.15339

46

citations

#38

GAIA: Zero-shot Talking Avatar Generation

Tianyu He, Junliang Guo, Runyi Yu et al.

ComboVerse: Compositional 3D Assets Creation Using Spatially-Aware Diffusion Guidance

Yongwei Chen, Tengfei Wang, Tong Wu et al.

ECCV 2024arXiv:2403.12409

3d asset generationsingle-image 3d generationspatially-aware diffusion guidancescore distillation sampling+4

44

citations

#40

Style2Talker: High-Resolution Talking Head Generation with Emotion Style and Art Style

Shuai Tan, Bin Ji, Ye Pan

AAAI 2024arXiv:2403.06365

talking head generationemotion style transferart style transferaudio-driven animation+4

43

citations

#41

OpenHumanVid: A Large-Scale High-Quality Dataset for Enhancing Human-Centric Video Generation

Hui Li, Mingwang Xu, Qingkun Su et al.

Generalizing Deepfake Video Detection with Plug-and-Play: Video-Level Blending and Spatiotemporal Adapter Tuning

Zhiyuan Yan, Yandan Zhao, Shen Chen et al.

CVPR 2025arXiv:2408.17065

39

citations

#43

A Simple Recipe for Contrastively Pre-training Video-First Encoders Beyond 16 Frames

Pinelopi Papalampidi, Skanda Koppula, Shreya Pathak et al.

SG-I2V: Self-Guided Trajectory Control in Image-to-Video Generation

Koichi Namekata, Sherwin Bahmani, Ziyi Wu et al.

ICLR 2025arXiv:2411.04989

39

citations

#45

Trajectory attention for fine-grained video motion control

Zeqi Xiao, Wenqi Ouyang, Yifan Zhou et al.

ICLR 2025arXiv:2411.19324

video diffusion modelscamera motion controltrajectory attentiontemporal attention+3

38

citations

#46

CoCoCo: Improving Text-Guided Video Inpainting for Better Consistency, Controllability and Compatibility

Bojia Zi, Shihao Zhao, Xianbiao Qi et al.

AAAI 2025arXiv:2403.12035

38

citations

#47

Autoregressive Adversarial Post-Training for Real-Time Interactive Video Generation

Shanchuan Lin, Ceyuan Yang, Hao He et al.

DragVideo: Interactive Drag-style Video Editing

Yufan Deng, Ruida Wang, Yuhao ZHANG et al.

ECCV 2024arXiv:2312.02216

36

citations

#49

FreeMotion: A Unified Framework for Number-free Text-to-Motion Synthesis

Ke Fan, Junshu Tang, Weijian Cao et al.

ECCV 2024arXiv:2405.15763

text-to-motion synthesismulti-person motion generationconditional motion distributionmotion spatial control+1

34

citations

#50

EnerVerse: Envisioning Embodied Future Space for Robotics Manipulation

Siyuan Huang, Liliang Chen, Pengfei Zhou et al.

NeurIPS 2025arXiv:2501.01895

embodied space generationvideo diffusion frameworkmulti-view video representation4d gaussian splatting+4

34

citations

#51

FreeVS: Generative View Synthesis on Free Driving Trajectory

Qitai Wang, Lue Fan, Yuqi Wang et al.

ICLR 2025arXiv:2410.18079

34

citations

#52

VidMuse: A Simple Video-to-Music Generation Framework with Long-Short-Term Modeling

Zeyue Tian, Zhaoyang Liu, Ruibin Yuan et al.

CVPR 2025arXiv:2406.04321

31

citations

#53

Training-free and Adaptive Sparse Attention for Efficient Long Video Generation

yifei xia, Suhan Ling, Fangcheng Fu et al.

ICCV 2025arXiv:2502.21079

30

citations

#54

Masked Generative Video-to-Audio Transformers with Enhanced Synchronicity

Santiago Pascual, Chunghsin YEH, Ioannis Tsiamas et al.

ECCV 2024arXiv:2407.10387

video-to-audio generationaudio-visual synchronizationgenerative audio codecmasked generative model+2

30

citations

#55

PREGO: Online Mistake Detection in PRocedural EGOcentric Videos

Alessandro Flaborea, Guido M. D&amp, #x27 et al.

CVPR 2024arXiv:2404.01933

30

citations

#56

Generative Inbetweening: Adapting Image-to-Video Models for Keyframe Interpolation

Xiaojuan Wang, Boyang Zhou, Brian Curless et al.

ICLR 2025arXiv:2408.15239

29

citations

#57

OmniViD: A Generative Framework for Universal Video Understanding

Junke Wang, Dongdong Chen, Chong Luo et al.

ARLON: Boosting Diffusion Transformers with Autoregressive Models for Long Video Generation

Zongyi Li, Shujie HU, Shujie LIU et al.

ICLR 2025arXiv:2410.20502

diffusion transformerslong video generationautoregressive modelslatent vq-vae+4

27

citations

#59

MoDGS: Dynamic Gaussian Splatting from Casually-captured Monocular Videos with Depth Priors

Qingming LIU, Yuan Liu, Jiepeng Wang et al.

ICLR 2025arXiv:2406.00434

26

citations

#60

Generative Rendering: Controllable 4D-Guided Video Generation with 2D Diffusion Models

Shengqu Cai, Duygu Ceylan, Matheus Gadelha et al.

OpenS2V-Nexus: A Detailed Benchmark and Million-Scale Dataset for Subject-to-Video Generation

Shenghai Yuan, Xianyi He, Yufan Deng et al.

NeurIPS 2025arXiv:2505.20292

25

citations

#62

FineVQ: Fine-Grained User Generated Content Video Quality Assessment

Huiyu Duan, Qiang Hu, Wang Jiarui et al.

CVPR 2025arXiv:2412.19238

25

citations

#63

AnimateAnything: Consistent and Controllable Animation for Video Generation

guojun lei, Chi Wang, Rong Zhang et al.

CVPR 2025arXiv:2411.10836

controllable video generationoptical flow guidancemotion representationtemporal coherence+2

24

citations

#64

AIGV-Assessor: Benchmarking and Evaluating the Perceptual Quality of Text-to-Video Generation with LMM

Wang Jiarui, Huiyu Duan, Guangtao Zhai et al.

Videoshop: Localized Semantic Video Editing with Noise-Extrapolated Diffusion Inversion

Xiang Fan, Anand Bhattad, Ranjay Krishna

ECCV 2024arXiv:2403.14617

23

citations

#66

V2Meow: Meowing to the Visual Beat via Video-to-Music Generation

Kun Su, Judith Li, Qingqing Huang et al.

AAAI 2024arXiv:2305.06594

video-to-music generationautoregressive modelvisual-audio correspondenceaudio codecs+4

23

citations

#67

OSV: One Step is Enough for High-Quality Image to Video Generation

Xiaofeng Mao, Zhengkai Jiang, Fu-Yun Wang et al.

CVPR 2025arXiv:2409.11367

22

citations

#68

ElasticTok: Adaptive Tokenization for Image and Video

Wilson Yan, Volodymyr Mnih, Aleksandra Faust et al.

ICLR 2025arXiv:2410.08368

21

citations

#69

MagicMirror: ID-Preserved Video Generation in Video Diffusion Transformers

Yuechen Zhang, YaoYang Liu, Bin Xia et al.

ICCV 2025arXiv:2501.03931

21

citations

#70

VideoMAC: Video Masked Autoencoders Meet ConvNets

Gensheng Pei, Tao Chen, Xiruo Jiang et al.

Generative Video Propagation

Shaoteng Liu, Tianyu Wang, Jui-Hsien Wang et al.

Towards Open Domain Text-Driven Synthesis of Multi-Person Motions

Shan Mengyi, Lu Dong, Yutao Han et al.

VMem: Consistent Interactive Video Scene Generation with Surfel-Indexed View Memory

Runjia Li, Philip Torr, Andrea Vedaldi et al.

ICCV 2025arXiv:2506.18903

20

citations

#74

Grid Diffusion Models for Text-to-Video Generation

Taegyeong Lee, Soyeong Kwon, Taehwan Kim

SnapGen-V: Generating a Five-Second Video within Five Seconds on a Mobile Device

Yushu Wu, Zhixing Zhang, Yanyu Li et al.

STIV: Scalable Text and Image Conditioned Video Generation

Zongyu Lin, Wei Liu, Chen Chen et al.

DreamComposer: Controllable 3D Object Generation via Multi-View Conditions

Yunhan Yang, Yukun Huang, Xiaoyang Wu et al.

Taming Teacher Forcing for Masked Autoregressive Video Generation

Deyu Zhou, Quan Sun, Yuang Peng et al.

CVPR 2025arXiv:2501.12389

19

citations

#79

InstructAvatar: Text-Guided Emotion and Motion Control for Avatar Generation

Yuchi Wang, Junliang Guo, Jianhong Bai et al.

TASTE-Rob: Advancing Video Generation of Task-Oriented Hand-Object Interaction for Generalizable Robotic Manipulation

Hongxiang Zhao, Xingchen Liu, Mutian Xu et al.

CVPR 2025arXiv:2503.11423

19

citations

#81

AMEGO: Active Memory from long EGOcentric videos

Gabriele Goletto, Tushar Nagarajan, Giuseppe Averta et al.

ECCV 2024arXiv:2409.10917

19

citations

#82

4Real-Video: Learning Generalizable Photo-Realistic 4D Video Diffusion

Chaoyang Wang, Peiye Zhuang, Tuan Duc Ngo et al.

MagDiff: Multi-Alignment Diffusion for High-Fidelity Video Generation and Editing

Haoyu Zhao, Tianyi Lu, Jiaxi Gu et al.

Vid2Avatar-Pro: Authentic Avatar from Videos in the Wild via Universal Prior

Chen Guo, Junxuan Li, Yash Kant et al.

MoST: Motion Style Transformer Between Diverse Action Contents

Boeun Kim, Jungho Kim, Hyung Jin Chang et al.

PUMA: Empowering Unified MLLM with Multi-granular Visual Generation

Rongyao Fang, Chengqi Duan, Kun Wang et al.

Unleashing the Potential of Multi-modal Foundation Models and Video Diffusion for 4D Dynamic Physical Scene Simulation

Zhuoman Liu, Weicai Ye, Yan Luximon et al.

MagicMotion: Controllable Video Generation with Dense-to-Sparse Trajectory Guidance

Quanhao Li, Zhen Xing, Rui Wang et al.

ICCV 2025arXiv:2503.16421

17

citations

#89

SlowFast-VGen: Slow-Fast Learning for Action-Driven Long Video Generation

Yining Hong, Beide Liu, Maxine Wu et al.

ICLR 2025arXiv:2410.23277

video generationdiffusion modelslong video generationaction-driven generation+4

17

citations

#90

SuperGaussian: Repurposing Video Models for 3D Super Resolution

Yuan Shen, Duygu Ceylan, Paul Guerrero et al.

ECCV 2024arXiv:2406.00609

16

citations

#91

Programmable Motion Generation for Open-Set Motion Control Tasks

Hanchao Liu, Xiaohang Zhan, Shaoli Huang et al.

CVPR 2024arXiv:2405.19283

16

citations

#92

Video Prediction by Modeling Videos as Continuous Multi-Dimensional Processes

Gaurav Shrivastava, Abhinav Shrivastava

GenFusion: Closing the Loop between Reconstruction and Generation via Videos

Sibo Wu, Congrong Xu, Binbin Huang et al.

CVPR 2025arXiv:2503.21219

16

citations

#94

Video-T1: Test-time Scaling for Video Generation

Fangfu Liu, Hanyang Wang, Yimo Cai et al.

ICCV 2025arXiv:2503.18942

15

citations

#95

VMBench: A Benchmark for Perception-Aligned Video Motion Generation

Xinran Ling, Chen Zhu, Meiqi Wu et al.

Training-Free Efficient Video Generation via Dynamic Token Carving

Yuechen Zhang, Jinbo Xing, bin xia et al.

OmniMotionGPT: Animal Motion Generation with Limited Data

Zhangsihao Yang, Mingyuan Zhou, Mengyi Shan et al.

Video Diffusion Models Are Strong Video Inpainter

Minhyeok Lee, Suhwan Cho, Chajin Shin et al.

Dynamic-VLM: Simple Dynamic Visual Token Compression for VideoLLM

Han Wang, Yuxiang Nie, Yongjie Ye et al.

ICCV 2025arXiv:2412.09530

14

citations

#100

MoVideo: Motion-Aware Video Generation with Diffusion Models

Jingyun Liang, Yuchen Fan, Kai Zhang et al.

ECCV 2024arXiv:2311.11325

diffusion modelsvideo generationoptical flow guidancevideo depth estimation+4

14

citations

Video Generation

Top Conferences

Related Topics (Generative Models)

Top Papers

CogVideoX: Text-to-Video Diffusion Models with An Expert Transformer

VBench: Comprehensive Benchmark Suite for Video Generative Models

ControlVideo: Training-free Controllable Text-to-video Generation

SV3D: Novel Multi-view Synthesis and 3D Generation from a Single Image using Latent Video Diffusion

Video-P2P: Video Editing with Cross-attention Control

Photorealistic Video Generation with Diffusion Models

EvalCrafter: Benchmarking and Evaluating Large Video Generation Models

EMO: Emote Portrait Alive - Generating Expressive Portrait Videos with Audio2Video Diffusion Model under Weak Conditions

SEINE: Short-to-Long Video Diffusion Model for Generative Transition and Prediction

StreamingT2V: Consistent, Dynamic, and Extendable Long Video Generation from Text

VideoBooth: Diffusion-based Video Generation with Image Prompts

Improving Video Generation with Human Feedback

DimensionX: Create Any 3D and 4D Scenes from a Single Image with Decoupled Video Diffusion

VideoPhy: Evaluating Physical Commonsense for Video Generation

VidToMe: Video Token Merging for Zero-Shot Video Editing

Loopy: Taming Audio-Driven Portrait Avatar with Long-Term Motion Dependency

Video ReCap: Recursive Captioning of Hour-Long Videos

General Object Foundation Model for Images and Videos at Scale

Real-Time Video Generation with Pyramid Attention Broadcast

EMAGE: Towards Unified Holistic Co-Speech Gesture Generation via Expressive Masked Audio Gesture Modeling

CCEdit: Creative and Controllable Video Editing via Diffusion Models

Towards World Simulator: Crafting Physical Commonsense-Based Benchmark for Video Generation

MV-Adapter: Multi-View Consistent Image Generation Made Easy

History-Guided Video Diffusion

One-Minute Video Generation with Test-Time Training

GameFactory: Creating New Games with Generative Interactive Videos

TC4D: Trajectory-Conditioned Text-to-4D Generation

Koala: Key Frame-Conditioned Long Video-LLM

PEEKABOO: Interactive Video Generation via Masked-Diffusion

Long Context Tuning for Video Generation

Self-Distilled Masked Auto-Encoders are Efficient Video Anomaly Detectors

SynCamMaster: Synchronizing Multi-Camera Video Generation from Diverse Viewpoints

Frame Context Packing and Drift Prevention in Next-Frame-Prediction Video Diffusion Models

Hierarchical Spatio-temporal Decoupling for Text-to-Video Generation

A Recipe for Scaling up Text-to-Video Generation with Text-free Videos

OMG: Towards Open-vocabulary Motion Generation via Mixture of Controllers

Image Conductor: Precision Control for Interactive Video Synthesis

GAIA: Zero-shot Talking Avatar Generation

ComboVerse: Compositional 3D Assets Creation Using Spatially-Aware Diffusion Guidance

Style2Talker: High-Resolution Talking Head Generation with Emotion Style and Art Style

OpenHumanVid: A Large-Scale High-Quality Dataset for Enhancing Human-Centric Video Generation

Generalizing Deepfake Video Detection with Plug-and-Play: Video-Level Blending and Spatiotemporal Adapter Tuning

A Simple Recipe for Contrastively Pre-training Video-First Encoders Beyond 16 Frames

SG-I2V: Self-Guided Trajectory Control in Image-to-Video Generation

Trajectory attention for fine-grained video motion control

CoCoCo: Improving Text-Guided Video Inpainting for Better Consistency, Controllability and Compatibility

Autoregressive Adversarial Post-Training for Real-Time Interactive Video Generation

DragVideo: Interactive Drag-style Video Editing

FreeMotion: A Unified Framework for Number-free Text-to-Motion Synthesis

EnerVerse: Envisioning Embodied Future Space for Robotics Manipulation

FreeVS: Generative View Synthesis on Free Driving Trajectory

VidMuse: A Simple Video-to-Music Generation Framework with Long-Short-Term Modeling

Training-free and Adaptive Sparse Attention for Efficient Long Video Generation

Masked Generative Video-to-Audio Transformers with Enhanced Synchronicity

PREGO: Online Mistake Detection in PRocedural EGOcentric Videos

Generative Inbetweening: Adapting Image-to-Video Models for Keyframe Interpolation

OmniViD: A Generative Framework for Universal Video Understanding

ARLON: Boosting Diffusion Transformers with Autoregressive Models for Long Video Generation

MoDGS: Dynamic Gaussian Splatting from Casually-captured Monocular Videos with Depth Priors

Generative Rendering: Controllable 4D-Guided Video Generation with 2D Diffusion Models

OpenS2V-Nexus: A Detailed Benchmark and Million-Scale Dataset for Subject-to-Video Generation

FineVQ: Fine-Grained User Generated Content Video Quality Assessment

AnimateAnything: Consistent and Controllable Animation for Video Generation

AIGV-Assessor: Benchmarking and Evaluating the Perceptual Quality of Text-to-Video Generation with LMM

Videoshop: Localized Semantic Video Editing with Noise-Extrapolated Diffusion Inversion

V2Meow: Meowing to the Visual Beat via Video-to-Music Generation

OSV: One Step is Enough for High-Quality Image to Video Generation

ElasticTok: Adaptive Tokenization for Image and Video

MagicMirror: ID-Preserved Video Generation in Video Diffusion Transformers

VideoMAC: Video Masked Autoencoders Meet ConvNets

Generative Video Propagation

Towards Open Domain Text-Driven Synthesis of Multi-Person Motions

VMem: Consistent Interactive Video Scene Generation with Surfel-Indexed View Memory

Grid Diffusion Models for Text-to-Video Generation

SnapGen-V: Generating a Five-Second Video within Five Seconds on a Mobile Device

STIV: Scalable Text and Image Conditioned Video Generation