Zhenheng Yang
8
Papers
838
Total Citations
Papers (8)
Show-o: One Single Transformer to Unify Multimodal Understanding and Generation
ICLR 2025arXiv
455
citations
OpenVid-1M: A Large-Scale High-Quality Dataset for Text-to-video Generation
ICLR 2025arXiv
200
citations
Show-o2: Improved Native Unified Multimodal Models
NeurIPS 2025arXiv
90
citations
Long Context Tuning for Video Generation
ICCV 2025arXiv
56
citations
STAR: Spatial-Temporal Augmentation with Text-to-Video Models for Real-World Video Super-Resolution
ICCV 2025arXiv
22
citations
InstanceCap: Improving Text-to-Video Generation via Instance-aware Structured Caption
CVPR 2025arXiv
14
citations
DiCo: Revitalizing ConvNets for Scalable and Efficient Diffusion Modeling
NeurIPS 2025arXiv
1
citations
Parallelized Autoregressive Visual Generation
CVPR 2025
0
citations