Video Editing via Factorized Diffusion Distillation

28citations

arXiv:2403.09334 PDF Project

Citations

#176

in ECCV 2024

of 2387 papers

Authors

Data Points

Authors

Uriel Singer Amit Zohar Yuval Kirstain Shelly Sheynin Adam Polyak Devi Parikh Yaniv Taigman

Topics

video editing diffusion distillation unsupervised distillation temporal consistency text-to-image model factorized distillation adapter alignment

Abstract

We introduce Emu Video Edit (EVE), a model that establishes a new state-of-the art in video editing without relying on any supervised video editing data. To develop EVE we separately train an image editing adapter and a video generation adapter, and attach both to the same text-to-image model. Then, to align the adapters towards video editing we introduce a new unsupervised distillation procedure, Factorized Diffusion Distillation. This procedure distills knowledge from one or more teachers simultaneously, without any supervised data. We utilize this procedure to teach EVE to edit videos by jointly distilling knowledge to (i) precisely edit each individual frame from the image editing adapter, and (ii) ensure temporal consistency among the edited frames using the video generation adapter. Finally, to demonstrate the potential of our approach in unlocking other capabilities, we align additional combinations of adapters

Citation History

Jan 25, 2026

Jan 27, 2026

Jan 31, 2026

28+28