"diffusion transformer" Papers

20 papers found

ACE: All-round Creator and Editor Following Instructions via Diffusion Transformer

Zhen Han, Zeyinzi Jiang, Yulin Pan et al.

ICLR 2025posterarXiv:2410.00086
43
citations

Acquire and then Adapt: Squeezing out Text-to-Image Model for Image Restoration

Junyuan Deng, Xinyi Wu, Yongxing Yang et al.

CVPR 2025posterarXiv:2504.15159
3
citations

CogVideoX: Text-to-Video Diffusion Models with An Expert Transformer

Zhuoyi Yang, Jiayan Teng, Wendi Zheng et al.

ICLR 2025oral
1355
citations

Dense2MoE: Restructuring Diffusion Transformer to MoE for Efficient Text-to-Image Generation

Youwei Zheng, Yuxi Ren, Xin Xia et al.

ICCV 2025posterarXiv:2510.09094
4
citations

Diffusion Transformer Captures Spatial-Temporal Dependencies: A Theory for Gaussian Process Data

Hengyu Fu, Zehao Dou, Jiawei Guo et al.

ICLR 2025oralarXiv:2407.16134
3
citations

Direct3D-S2: Gigascale 3D Generation Made Easy with Spatial Sparse Attention

Shuang Wu, Youtian Lin, Feihu Zhang et al.

NeurIPS 2025posterarXiv:2505.17412
34
citations

DiTaiListener: Controllable High Fidelity Listener Video Generation with Diffusion

Maksim Siniukov, Di Chang, Minh Tran et al.

ICCV 2025posterarXiv:2504.04010
3
citations

IRASim: A Fine-Grained World Model for Robot Manipulation

Fangqi Zhu, Hongtao Wu, Song Guo et al.

ICCV 2025posterarXiv:2406.14540
21
citations

Language-Guided Image Tokenization for Generation

Kaiwen Zha, Lijun Yu, Alireza Fathi et al.

CVPR 2025posterarXiv:2412.05796
23
citations

Lumina-T2X: Scalable Flow-based Large Diffusion Transformer for Flexible Resolution Generation

Gao Peng, Le Zhuo, Dongyang Liu et al.

ICLR 2025oral
9
citations

Multi-subject Open-set Personalization in Video Generation

Tsai-Shien Chen, Aliaksandr Siarohin, Willi Menapace et al.

CVPR 2025posterarXiv:2501.06187
40
citations

Neural-Driven Image Editing

Pengfei Zhou, Jie Xia, Xiaopeng Peng et al.

NeurIPS 2025posterarXiv:2507.05397
2
citations

OminiControl: Minimal and Universal Control for Diffusion Transformer

Zhenxiong Tan, Songhua Liu, Xingyi Yang et al.

ICCV 2025highlightarXiv:2411.15098
214
citations

Pippo: High-Resolution Multi-View Humans from a Single Image

Yash Kant, Ethan Weber, Jin Kyu Kim et al.

CVPR 2025highlightarXiv:2502.07785
14
citations

TokMan:Tokenize Manhattan Mask Optimization for Inverse Lithography

Yiwen Wu, Yuyang Chen, Ye Xia et al.

NeurIPS 2025poster

VACE: All-in-One Video Creation and Editing

Zeyinzi Jiang, Zhen Han, Chaojie Mao et al.

ICCV 2025posterarXiv:2503.07598
169
citations

VideoVLA: Video Generators Can Be Generalizable Robot Manipulators

Yichao Shen, Fangyun Wei, Zhiying Du et al.

NeurIPS 2025posterarXiv:2512.06963
3
citations

CasCast: Skillful High-resolution Precipitation Nowcasting via Cascaded Modelling

JUNCHAO GONG, LEI BAI, Peng Ye et al.

ICML 2024poster

Lazy Diffusion Transformer for Interactive Image Editing

Yotam Nitzan, Zongze Wu, Richard Zhang et al.

ECCV 2024posterarXiv:2404.12382
16
citations

SLEDGE: Synthesizing Driving Environments with Generative Models and Rule-Based Traffic

Kashyap Chitta, Daniel Dauner, Andreas Geiger

ECCV 2024posterarXiv:2403.17933
21
citations