2025 "text-to-image generation" Papers
62 papers found • Page 1 of 2
3DIS: Depth-Driven Decoupled Image Synthesis for Universal Multi-Instance Generation
Dewei Zhou, Ji Xie, Zongxin Yang et al.
Ambient Diffusion Omni: Training Good Models with Bad Data
Giannis Daras, Adrian Rodriguez-Munoz, Adam Klivans et al.
AMO Sampler: Enhancing Text Rendering with Overshooting
Xixi Hu, Keyang Xu, Bo Liu et al.
Autoregressive Video Generation without Vector Quantization
Haoge Deng, Ting Pan, Haiwen Diao et al.
BizGen: Advancing Article-level Visual Text Rendering for Infographics Generation
Yuyang Peng, Shishi Xiao, Keming Wu et al.
CAP: Evaluation of Persuasive and Creative Image Generation
Aysan Aghazadeh, Adriana Kovashka
CFG++: Manifold-constrained Classifier Free Guidance for Diffusion Models
Hyungjin Chung, Jeongsol Kim, Geon Yeong Park et al.
ChatGen: Automatic Text-to-Image Generation From FreeStyle Chatting
Chengyou Jia, Changliang Xia, Zhuohang Dang et al.
Compositional Discrete Latent Code for High Fidelity, Productive Diffusion Models
Samuel Lavoie, Michael Noukhovitch, Aaron Courville
CompSlider: Compositional Slider for Disentangled Multiple-Attribute Image Generation
Zixin Zhu, Kevin Duarte, Mamshad Nayeem Rizve et al.
CPO: Condition Preference Optimization for Controllable Image Generation
Zonglin Lyu, Ming Li, Xinxin Liu et al.
Curriculum Direct Preference Optimization for Diffusion and Consistency Models
Florinel Croitoru, Vlad Hondru, Radu Tudor Ionescu et al.
DC-AR: Efficient Masked Autoregressive Image Generation with Deep Compression Hybrid Tokenizer
Yecheng Wu, Han Cai, Junyu Chen et al.
Deeply Supervised Flow-Based Generative Models
Inkyu Shin, Chenglin Yang, Liang-Chieh Chen
Denoising Autoregressive Transformers for Scalable Text-to-Image Generation
Jiatao Gu, Yuyang Wang, Yizhe Zhang et al.
Dense2MoE: Restructuring Diffusion Transformer to MoE for Efficient Text-to-Image Generation
Youwei Zheng, Yuxi Ren, Xin Xia et al.
DGQ: Distribution-Aware Group Quantization for Text-to-Image Diffusion Models
Hyogon Ryu, NaHyeon Park, Hyunjung Shim
DISCO: DISCrete nOise for Conditional Control in Text-to-Image Diffusion Models
Longquan Dai, Wu Ming, Dejiao Xue et al.
DSPO: Direct Score Preference Optimization for Diffusion Model Alignment
Huaisheng Zhu, Teng Xiao, Vasant Honavar
DynamicID: Zero-Shot Multi-ID Image Personalization with Flexible Facial Editability
Xirui Hu, Jiahao Wang, Hao chen et al.
Exploring Diffusion Transformer Designs via Grafting
Keshigeyan Chandrasegaran, Michael Poli, Dan Fu et al.
FairGen: Enhancing Fairness in Text-to-Image Diffusion Models via Self-Discovering Latent Directions
Yilei Jiang, Wei-Hong Li, Yiyuan Zhang et al.
Feedback Guidance of Diffusion Models
Felix Koulischer, Florian Handke, Johannes Deleu et al.
FineLIP: Extending CLIP’s Reach via Fine-Grained Alignment with Longer Text Inputs
Mothilal Asokan, Kebin wu, Fatima Albreiki
Flowing from Words to Pixels: A Noise-Free Framework for Cross-Modality Evolution
Qihao Liu, Xi Yin, Alan L. Yuille et al.
FreeCus: Free Lunch Subject-driven Customization in Diffusion Transformers
Yanbing Zhang, Zhe Wang, Qin Zhou et al.
Goku: Flow Based Video Generative Foundation Models
Shoufa Chen, Chongjian GE, Yuqi Zhang et al.
Halton Scheduler for Masked Generative Image Transformer
Victor Besnier, Mickael Chen, David Hurych et al.
ImgEdit: A Unified Image Editing Dataset and Benchmark
Yang Ye, Xianyi He, Zongjian Li et al.
Janus-Pro-R1: Advancing Collaborative Visual Comprehension and Generation via Reinforcement Learning
Kaihang Pan, Yang Wu, Wendong Bu et al.
Language-Guided Image Tokenization for Generation
Kaiwen Zha, Lijun Yu, Alireza Fathi et al.
Large-Scale Text-to-Image Model with Inpainting is a Zero-Shot Subject-Driven Image Generator
Chaehun Shin, Jooyoung Choi, Heeseung Kim et al.
LayerCraft: Enhancing Text-to-Image Generation with CoT Reasoning and Layered Object Integration
Yuyao Zhang, Jinghao Li, Yu-Wing Tai
Less is More: Masking Elements in Image Condition Features Avoids Content Leakages in Style Transfer Diffusion Models
Lin Zhu, Xinbing Wang, Chenghu Zhou et al.
LoRA-X: Bridging Foundation Models with Training-Free Cross-Model Adaptation
Farzad Farhadzadeh, Debasmit Das, Shubhankar Borse et al.
Make It Count: Text-to-Image Generation with an Accurate Number of Objects
Lital Binyamin, Yoad Tewel, Hilit Segev et al.
MCCD: Multi-Agent Collaboration-based Compositional Diffusion for Complex Text-to-Image Generation
Mingcheng Li, Xiaolu Hou, Ziyang Liu et al.
Measuring And Improving Engagement of Text-to-Image Generation Models
Varun Khurana, Yaman Singla, Jayakumar Subramanian et al.
Memories of Forgotten Concepts
Matan Rusanovsky, Shimon Malnick, Amir Jevnisek et al.
Memory-Efficient Visual Autoregressive Modeling with Scale-Aware KV Cache Compression
Kunjun Li, Zigeng Chen, Cheng-Yen Yang et al.
MJ-Bench: Is Your Multimodal Reward Model Really a Good Judge for Text-to-Image Generation?
Zhaorun Chen, Zichen Wen, Yichao Du et al.
NL-Eye: Abductive NLI For Images
Mor Ventura, Michael Toker, Nitay Calderon et al.
Parallel Sequence Modeling via Generalized Spatial Propagation Network
Hongjun Wang, Wonmin Byeon, Jiarui Xu et al.
PLADIS: Pushing the Limits of Attention in Diffusion Models at Inference Time by Leveraging Sparsity
Kwanyoung Kim, Byeongsu Sim
Precise Parameter Localization for Textual Generation in Diffusion Models
Łukasz Staniszewski, Bartosz Cywiński, Franziska Boenisch et al.
Ranking-based Preference Optimization for Diffusion Models from Implicit User Feedback
Yi-Lun Wu, Bo-Kai Ruan, Chiang Tseng et al.
RB-Modulation: Training-Free Stylization using Reference-Based Modulation
Litu Rout, Yujia Chen, Nataniel Ruiz et al.
RespoDiff: Dual-Module Bottleneck Transformation for Responsible & Faithful T2I Generation
Silpa Vadakkeeveetil Sreelatha, Sauradip Nag, Muhammad Awais et al.
SANA-Sprint: One-Step Diffusion with Continuous-Time Consistency Distillation
Junsong Chen, Shuchen Xue, Yuyang Zhao et al.
Scaling can lead to compositional generalization
Florian Redhardt, Yassir Akram, Simon Schug