"text-to-image generation" Papers
54 papers found • Page 1 of 2
3DIS: Depth-Driven Decoupled Image Synthesis for Universal Multi-Instance Generation
Dewei Zhou, Ji Xie, Zongxin Yang et al.
Ambient Diffusion Omni: Training Good Models with Bad Data
Giannis Daras, Adrian Rodriguez-Munoz, Adam Klivans et al.
AMO Sampler: Enhancing Text Rendering with Overshooting
Xixi Hu, Keyang Xu, Bo Liu et al.
Autoregressive Video Generation without Vector Quantization
Haoge Deng, Ting Pan, Haiwen Diao et al.
BizGen: Advancing Article-level Visual Text Rendering for Infographics Generation
Yuyang Peng, Shishi Xiao, Keming Wu et al.
CPO: Condition Preference Optimization for Controllable Image Generation
Zonglin Lyu, Ming Li, Xinxin Liu et al.
DC-AR: Efficient Masked Autoregressive Image Generation with Deep Compression Hybrid Tokenizer
Yecheng Wu, Han Cai, Junyu Chen et al.
Deeply Supervised Flow-Based Generative Models
Inkyu Shin, Chenglin Yang, Liang-Chieh Chen
DISCO: DISCrete nOise for Conditional Control in Text-to-Image Diffusion Models
Longquan Dai, Wu Ming, Dejiao Xue et al.
DSPO: Direct Score Preference Optimization for Diffusion Model Alignment
Huaisheng Zhu, Teng Xiao, Vasant Honavar
FreeCus: Free Lunch Subject-driven Customization in Diffusion Transformers
Yanbing Zhang, Zhe Wang, Qin Zhou et al.
Goku: Flow Based Video Generative Foundation Models
Shoufa Chen, Chongjian GE, Yuqi Zhang et al.
Language-Guided Image Tokenization for Generation
Kaiwen Zha, Lijun Yu, Alireza Fathi et al.
Measuring And Improving Engagement of Text-to-Image Generation Models
Varun Khurana, Yaman Singla, Jayakumar Subramanian et al.
Memories of Forgotten Concepts
Matan Rusanovsky, Shimon Malnick, Amir Jevnisek et al.
MJ-Bench: Is Your Multimodal Reward Model Really a Good Judge for Text-to-Image Generation?
Zhaorun Chen, Zichen Wen, Yichao Du et al.
NL-Eye: Abductive NLI For Images
Mor Ventura, Michael Toker, Nitay Calderon et al.
Parallel Sequence Modeling via Generalized Spatial Propagation Network
Hongjun Wang, Wonmin Byeon, Jiarui Xu et al.
Scaling can lead to compositional generalization
Florian Redhardt, Yassir Akram, Simon Schug
Stretching Each Dollar: Diffusion Training from Scratch on a Micro-Budget
Vikash Sehwag, Xianghao Kong, Jingtao Li et al.
Text-to-Image Rectified Flow as Plug-and-Play Priors
Xiaofeng Yang, Cheng Chen, xulei yang et al.
Towards Understanding and Quantifying Uncertainty for Text-to-Image Generation
Gianni Franchi, Nacim Belkhir, Dat NGUYEN et al.
UNIC-Adapter: Unified Image-instruction Adapter with Multi-modal Transformer for Image Generation
Lunhao Duan, Shanshan Zhao, Wenjun Yan et al.
Zigzag Diffusion Sampling: Diffusion Models Can Self-Improve via Self-Reflection
Lichen Bai, Shitong Shao, zikai zhou et al.
Accelerating Parallel Sampling of Diffusion Models
Zhiwei Tang, Jiasheng Tang, Hao Luo et al.
AltDiffusion: A Multilingual Text-to-Image Diffusion Model
Fulong Ye, Guang Liu, Xinya Wu et al.
An Image is Worth Multiple Words: Discovering Object Level Concepts using Multi-Concept Prompt Learning
Chen Jin, Ryutaro Tanno, Amrutha Saseendran et al.
Bespoke Non-Stationary Solvers for Fast Sampling of Diffusion and Flow Models
Neta Shaul, Uriel Singer, Ricky T. Q. Chen et al.
Chains of Diffusion Models
Yanheng Wei, Lianghua Huang, Zhi-Fan Wu et al.
Compositional Text-to-Image Generation with Dense Blob Representations
Weili Nie, Sifei Liu, Morteza Mardani et al.
Data-free Distillation of Diffusion Models with Bootstrapping
Jiatao Gu, Chen Wang, Shuangfei Zhai et al.
Diffusion Models for Monocular Depth Estimation: Overcoming Challenging Conditions
Fabio Tosi, Pierluigi Zama Ramirez, Matteo Poggi
Diffusion Rejection Sampling
Byeonghu Na, Yeongmin Kim, Minsang Park et al.
E$^2$GAN: Efficient Training of Efficient GANs for Image-to-Image Translation
Yifan Gong, Zheng Zhan, Qing Jin et al.
Easing Concept Bleeding in Diffusion via Entity Localization and Anchoring
Jiewei Zhang, Song Guo, Peiran Dong et al.
FouriScale: A Frequency Perspective on Training-Free High-Resolution Image Synthesis
Linjiang Huang, Rongyao Fang, Aiping Zhang et al.
Grid-Attention: Enhancing Computational Efficiency of Large Vision Models without Fine-Tuning
Pengyu Li, Biao Wang, Tianchu Guo et al.
Latent Guard: a Safety Framework for Text-to-image Generation
Runtao Liu, Ashkan Khakzar, Jindong Gu et al.
Learning Pseudo 3D Guidance for View-consistent Texturing with 2D Diffusion
Kehan Li, Yanbo Fan, Yang Wu et al.
Learning Subject-Aware Cropping by Outpainting Professional Photos
James Hong, Lu Yuan, Michaël Gharbi et al.
Mastering Text-to-Image Diffusion: Recaptioning, Planning, and Generating with Multimodal LLMs
Ling Yang, Zhaochen Yu, Chenlin Meng et al.
Not Just Pretty Pictures: Toward Interventional Data Augmentation Using Text-to-Image Generators
Jianhao Yuan, Francesco Pinto, Adam Davies et al.
On Discrete Prompt Optimization for Diffusion Models
Ruochen Wang, Ting Liu, Cho-Jui Hsieh et al.
Progressive Text-to-Image Diffusion with Soft Latent Direction
YuTeng Ye, Jiale Cai, Hang Zhou et al.
Prompt-tuning Latent Diffusion Models for Inverse Problems
Hyungjin Chung, Jong Chul YE, Peyman Milanfar et al.
Quality Diversity through Human Feedback: Towards Open-Ended Diversity-Driven Optimization
Li Ding, Jenny Zhang, Jeff Clune et al.
Referee Can Play: An Alternative Approach to Conditional Generation via Model Inversion
Xuantong Liu, Tianyang Hu, Wenjia Wang et al.
Scaling Rectified Flow Transformers for High-Resolution Image Synthesis
Patrick Esser, Sumith Kulal, Andreas Blattmann et al.
Semantic-Aware Human Object Interaction Image Generation
zhu xu, Qingchao Chen, Yuxin Peng et al.
Skews in the Phenomenon Space Hinder Generalization in Text-to-Image Generation
Yingshan Chang, Yasi Zhang, Zhiyuan Fang et al.