PanoWan: Lifting Diffusion Video Generation Models to 360$^\circ$ with Latitude/Longitude-aware Mechanisms

5citations

Project

Citations

#652

in NeurIPS 2025

of 5858 papers

Authors

Data Points

Authors

Yifei Xia Shuchen Weng Siqi Yang Jingqi Liu Chengxuan Zhu Minggui Teng Zijian Jia Han Jiang Boxin Shi

Abstract

Panoramic video generation enables immersive 360$^\circ$ content creation, valuable in applications that demand scene-consistent world exploration. However, existing panoramic video generation models struggle to leverage pre-trained generative priors from conventional text-to-video models for high-quality and diverse panoramic videos generation, due to limited dataset scale and the gap in spatial feature representations. In this paper, we introduce PanoWan to effectively lift pre-trained text-to-video models to the panoramic domain, equipped with minimal modules. PanoWan employs latitude-aware sampling to avoid latitudinal distortion, while its rotated semantic denoising and padded pixel-wise decoding ensure seamless transitions at longitude boundaries. To provide sufficient panoramic videos for learning these lifted representations, we contribute PanoVid, a high-quality panoramic video dataset with captions and diverse scenarios. Consequently, PanoWan achieves state-of-the-art performance in panoramic video generation and demonstrates robustness for zero-shot downstream tasks.

Citation History

Jan 26, 2026

Jan 27, 2026

5+5