🧬Generative Models

Text-to-Image Generation

Generating images from text descriptions

100 papers10,175 total citations
Compare with other topics
Feb '24 Jan '26844 papers
Also includes: text-to-image generation, text-to-image, text to image, image generation from text

Top Papers

#1

T2I-Adapter: Learning Adapters to Dig Out More Controllable Ability for Text-to-Image Diffusion

Chong Mou, Xintao Wang, Liangbin Xie et al.

AAAI 2024arXiv:2302.08453
text-to-image diffusionadapter learningcontrollable generationstructure control+4
1,423
citations
#2

Scaling Autoregressive Models for Content-Rich Text-to-Image Generation

Xin Li, Jing Yu Koh, Alexander Ku et al.

ICLR 2024
1,366
citations
#3

CogVideoX: Text-to-Video Diffusion Models with An Expert Transformer

Zhuoyi Yang, Jiayan Teng, Wendi Zheng et al.

ICLR 2025
1,318
citations
#4

InstantBooth: Personalized Text-to-Image Generation without Test-Time Finetuning

Jing Shi, Wei Xiong, Zhe Lin et al.

CVPR 2024
369
citations
#5

ControlVideo: Training-free Controllable Text-to-video Generation

Yabo Zhang, Yuxiang Wei, Dongsheng jiang et al.

ICLR 2024
331
citations
#6

Text-to-3D using Gaussian Splatting

Zilong Chen, Feng Wang, Yikai Wang et al.

CVPR 2024
330
citations
#7

Transfusion: Predict the Next Token and Diffuse Images with One Multi-Modal Model

Chunting Zhou, Lili Yu, Arun Babu et al.

ICLR 2025
294
citations
#8

Follow Your Pose: Pose-Guided Text-to-Video Generation Using Pose-Free Videos

Yue Ma, Yingqing HE, Xiaodong Cun et al.

AAAI 2024arXiv:2304.01186
pose-guided generationtext-to-video generationcharacter video synthesispose-controllable generation+4
276
citations
#9

Emu Edit: Precise Image Editing via Recognition and Generation Tasks

Shelly Sheynin, Adam Polyak, Uriel Singer et al.

CVPR 2024
238
citations
#10

Grounded Text-to-Image Synthesis with Attention Refocusing

Quynh Phung, Songwei Ge, Jia-Bin Huang

CVPR 2024
157
citations
#11

StreamingT2V: Consistent, Dynamic, and Extendable Long Video Generation from Text

Roberto Henschel, Levon Khachatryan, Hayk Poghosyan et al.

CVPR 2025
154
citations
#12

AnyText: Multilingual Visual Text Generation and Editing

Yuxiang Tuo, Wangmeng Xiang, Jun-Yan He et al.

ICLR 2024
121
citations
#13

WonderWorld: Interactive 3D Scene Generation from a Single Image

Hong-Xing Yu, Haoyi Duan, Charles Herrmann et al.

CVPR 2025
120
citations
#14

MIGC: Multi-Instance Generation Controller for Text-to-Image Synthesis

Dewei Zhou, You Li, Fan Ma et al.

CVPR 2024
109
citations
#15

MS-Diffusion: Multi-subject Zero-shot Image Personalization with Layout Guidance

Xierui Wang, Siming Fu, Qihan Huang et al.

ICLR 2025
104
citations
#16

An Empirical Study of CLIP for Text-Based Person Search

Cao Min, Yang Bai, ziyin Zeng et al.

AAAI 2024arXiv:2308.10045
text-based person searchcontrastive language image pretrainingcross-modal retrievalvision-language pre-training+3
94
citations
#17

Compositional Text-to-Image Synthesis with Attention Map Control of Diffusion Models

Ruichen Wang, Zekang Chen, Chen Chen et al.

AAAI 2024arXiv:2305.13921
text-to-image synthesisdiffusion modelsattention map controlcompositional capabilities+3
92
citations
#18

DreamBench++: A Human-Aligned Benchmark for Personalized Image Generation

Yuang Peng, Yuxin Cui, Haomiao Tang et al.

ICLR 2025
91
citations
#19

MobileCLIP: Fast Image-Text Models through Multi-Modal Reinforced Training

Pavan Kumar Anasosalu Vasu, Hadi Pouransari, Fartash Faghri et al.

CVPR 2024
84
citations
#20

Ranni: Taming Text-to-Image Diffusion for Accurate Instruction Following

Yutong Feng, Biao Gong, Di Chen et al.

CVPR 2024
78
citations
#21

Learning Multi-Dimensional Human Preference for Text-to-Image Generation

Sixian Zhang, Bohan Wang, Junqiang Wu et al.

CVPR 2024
76
citations
#22

Paint-it: Text-to-Texture Synthesis via Deep Convolutional Texture Map Optimization and Physically-Based Rendering

Kim Youwang, Tae-Hyun Oh, Gerard Pons-Moll

CVPR 2024
75
citations
#23

MaskBit: Embedding-free Image Generation via Bit Tokens

Mark Weber, Lijun Yu, Qihang Yu et al.

ICLR 2025
72
citations
#24

PromptTTS 2: Describing and Generating Voices with Text Prompt

Yichong Leng, ZHifang Guo, Kai Shen et al.

ICLR 2024
70
citations
#25

TC4D: Trajectory-Conditioned Text-to-4D Generation

Sherwin Bahmani, Xian Liu, Wang Yifan et al.

ECCV 2024arXiv:2403.17920
text-to-4d generationtrajectory-conditioned generationdynamic 3d scenesneural representations+4
64
citations
#26

Text Is MASS: Modeling as Stochastic Embedding for Text-Video Retrieval

Jiamian Wang, Guohao Sun, Pichao Wang et al.

CVPR 2024
63
citations
#27

ImageFolder: Autoregressive Image Generation with Folded Tokens

Xiang Li, Kai Qiu, Hao Chen et al.

ICLR 2025
63
citations
#28

DreamMatcher: Appearance Matching Self-Attention for Semantically-Consistent Text-to-Image Personalization

Jisu Nam, Heesu Kim, DongJae Lee et al.

CVPR 2024
62
citations
#29

Text2HOI: Text-guided 3D Motion Generation for Hand-Object Interaction

Junuk Cha, Jihyeon Kim, Jae Shin Yoon et al.

CVPR 2024
58
citations
#30

Harnessing the Power of MLLMs for Transferable Text-to-Image Person ReID

Wentao Tan, Changxing Ding, Jiayu Jiang et al.

CVPR 2024
55
citations
#31

Hierarchical Spatio-temporal Decoupling for Text-to-Video Generation

Zhiwu Qing, Shiwei Zhang, Jiayu Wang et al.

CVPR 2024
55
citations
#32

Watermarking Conditional Text Generation for AI Detection: Unveiling Challenges and a Semantic-Aware Watermark Remedy

Yu Fu, Deyi Xiong, Yue Dong

AAAI 2024arXiv:2307.13808
watermarking text generationai detectionconditional text generationsemantic-aware watermarking+4
54
citations
#33

A Recipe for Scaling up Text-to-Video Generation with Text-free Videos

Xiang Wang, Shiwei Zhang, Hangjie Yuan et al.

CVPR 2024
53
citations
#34

Latent Guard: a Safety Framework for Text-to-image Generation

Runtao Liu, Ashkan Khakzar, Jindong Gu et al.

ECCV 2024arXiv:2404.08031
text-to-image generationsafety frameworkslatent space learningharmful content detection+3
53
citations
#35

FakeInversion: Learning to Detect Images from Unseen Text-to-Image Models by Inverting Stable Diffusion

George Cazenavette, Avneesh Sud, Thomas Leung et al.

CVPR 2024
53
citations
#36

GenerateCT: Text-Conditional Generation of 3D Chest CT Volumes

Ibrahim Ethem Hamamci, Sezgin Er, Anjany Sekuboyina et al.

ECCV 2024
52
citations
#37

Consistent3D: Towards Consistent High-Fidelity Text-to-3D Generation with Deterministic Sampling Prior

Zike Wu, Pan Zhou, YI Xuanyu et al.

CVPR 2024
51
citations
#38

GVGEN: Text-to-3D Generation with Volumetric Representation

Xianglong He, Junyi Chen, Sida Peng et al.

ECCV 2024arXiv:2403.12957
3d gaussian splattingvolumetric representationtext-to-3d generationdiffusion-based framework+3
51
citations
#39

AutomaTikZ: Text-Guided Synthesis of Scientific Vector Graphics with TikZ

Jonas Belouadi, Anne Lauscher, Steffen Eger

ICLR 2024
49
citations
#40

JeDi: Joint-Image Diffusion Models for Finetuning-Free Personalized Text-to-Image Generation

Yu Zeng, Vishal M. Patel, Haochen Wang et al.

CVPR 2024
47
citations
#41

UDiffText: A Unified Framework for High-quality Text Synthesis in Arbitrary Images via Character-aware Diffusion Models

Yiming Zhao, Zhouhui Lian

ECCV 2024
47
citations
#42

Generating Human Motion in 3D Scenes from Text Descriptions

Zhi Cen, Huaijin Pi, Sida Peng et al.

CVPR 2024
46
citations
#43

MultiBooth: Towards Generating All Your Concepts in an Image from Text

Chenyang Zhu, Kai Li, Yue Ma et al.

AAAI 2025
46
citations
#44

Unveiling and Mitigating Memorization in Text-to-image Diffusion Models through Cross Attention

Jie Ren, Yaxin Li, Shenglai Zeng et al.

ECCV 2024
46
citations
#45

Improving Image Restoration through Removing Degradations in Textual Representations

Jingbo Lin, Zhilu Zhang, Yuxiang Wei et al.

CVPR 2024
45
citations
#46

DreamStyler: Paint by Style Inversion with Text-to-Image Diffusion Models

Namhyuk Ahn, Junsoo Lee, Chunggi Lee et al.

AAAI 2024arXiv:2309.06933
style inversiontext-to-image synthesisartistic image synthesisstyle transfer+3
44
citations
#47

Meissonic: Revitalizing Masked Generative Transformers for Efficient High-Resolution Text-to-Image Synthesis

Jinbin Bai, Tian Ye, Wei Chow et al.

ICLR 2025
43
citations
#48

FlexTok: Resampling Images into 1D Token Sequences of Flexible Length

Roman Bachmann, Jesse Allardice, David Mizrahi et al.

ICML 2025
43
citations
#49

Style2Talker: High-Resolution Talking Head Generation with Emotion Style and Art Style

Shuai Tan, Bin Ji, Ye Pan

AAAI 2024arXiv:2403.06365
talking head generationemotion style transferart style transferaudio-driven animation+4
43
citations
#50

BAMM: Bidirectional Autoregressive Motion Model

Ekkasit Pinyoanuntapong, Muhammad Usama Saleem, Pu Wang et al.

ECCV 2024
41
citations
#51

Few-Shot Detection of Machine-Generated Text using Style Representations

Rafael Rivera Soto, Kailin Koch, Aleem Khan et al.

ICLR 2024
41
citations
#52

SANA: Efficient High-Resolution Text-to-Image Synthesis with Linear Diffusion Transformers

Enze Xie, Junsong Chen, Junyu Chen et al.

ICLR 2025
40
citations
#53

RealCustom: Narrowing Real Text Word for Real-Time Open-Domain Text-to-Image Customization

Mengqi Huang, Zhendong Mao, Mingcong Liu et al.

CVPR 2024
38
citations
#54

SimAC: A Simple Anti-Customization Method for Protecting Face Privacy against Text-to-Image Synthesis of Diffusion Models

Feifei Wang, Zhentao Tan, Tianyi Wei et al.

CVPR 2024
37
citations
#55

ElasticDiffusion: Training-free Arbitrary Size Image Generation through Global-Local Content Separation

Moayed Haji Ali, Guha Balakrishnan, Vicente Ordonez

CVPR 2024
37
citations
#56

DartControl: A Diffusion-Based Autoregressive Motion Model for Real-Time Text-Driven Motion Control

Kaifeng Zhao, Gen Li, Siyu Tang

ICLR 2025
36
citations
#57

Control4D: Efficient 4D Portrait Editing with Text

Ruizhi Shao, Jingxiang Sun, Cheng Peng et al.

CVPR 2024
36
citations
#58

Prompting Hard or Hardly Prompting: Prompt Inversion for Text-to-Image Diffusion Models

Shweta Mahajan, Tanzila Rahman, Kwang Moo Yi et al.

CVPR 2024
35
citations
#59

Add-it: Training-Free Object Insertion in Images With Pretrained Diffusion Models

Yoad Tewel, Rinon Gal, Dvir Samuel et al.

ICLR 2025arXiv:2411.07232
attention mechanismdiffusion modelssemantic image editingobject insertion+3
34
citations
#60

MobileDiffusion: Instant Text-to-Image Generation on Mobile Devices

Yang Zhao, Zhisheng Xiao, Yanwu Xu et al.

ECCV 2024
34
citations
#61

Disentangled Clothed Avatar Generation from Text Descriptions

Jionghao Wang, Yuan Liu, Zhiyang Dou et al.

ECCV 2024
34
citations
#62

Infinite-ID: Identity-preserved Personalization via ID-semantics Decoupling Paradigm

Yi Wu, Ziqiang Li, Heliang Zheng et al.

ECCV 2024
34
citations
#63

CreatiLayout: Siamese Multimodal Diffusion Transformer for Creative Layout-to-Image Generation

Hui Zhang, Dexiang Hong, Yitong Wang et al.

ICCV 2025arXiv:2412.03859
layout-to-image generationmultimodal diffusion transformerssiamese network architecturecreative layout planning+1
33
citations
#64

Spherical Linear Interpolation and Text-Anchoring for Zero-shot Composed Image Retrieval

Young Kyun Jang, Dat B Huynh, Ashish Shah et al.

ECCV 2024
32
citations
#65

Make It Count: Text-to-Image Generation with an Accurate Number of Objects

Lital Binyamin, Yoad Tewel, Hilit Segev et al.

CVPR 2025
32
citations
#66

Text4Seg: Reimagining Image Segmentation as Text Generation

Mengcheng Lan, Chaofeng Chen, Yue Zhou et al.

ICLR 2025
32
citations
#67

LLM4GEN: Leveraging Semantic Representation of LLMs for Text-to-Image Generation

Mushui Liu, Yuhang Ma, Zhen Yang et al.

AAAI 2025
31
citations
#68

One-Prompt-One-Story: Free-Lunch Consistent Text-to-Image Generation Using a Single Prompt

Tao Liu, Kai Wang, Senmao Li et al.

ICLR 2025
30
citations
#69

StarVector: Generating Scalable Vector Graphics Code from Images and Text

Juan Rodriguez, Abhay Puri, Shubham Agarwal et al.

CVPR 2025arXiv:2312.11556
svg generationmultimodal language modelsimage vectorizationvector graphics code+4
30
citations
#70

VP3D: Unleashing 2D Visual Prompt for Text-to-3D Generation

Yang Chen, Yingwei Pan, haibo yang et al.

CVPR 2024
30
citations
#71

PosterLlama: Bridging Design Ability of Langauge Model to Content-Aware Layout Generation

Jaejung Seol, Seojun Kim, Jaejun Yoo

ECCV 2024
30
citations
#72

Make-Your-Anchor: A Diffusion-based 2D Avatar Generation Framework

Ziyao Huang, Fan Tang, Yong Zhang et al.

CVPR 2024
29
citations
#73

DreamFlow: High-quality text-to-3D generation by Approximating Probability Flow

Kyungmin Lee, Kihyuk Sohn, Jinwoo Shin

ICLR 2024
28
citations
#74

R&B: Region and Boundary Aware Zero-shot Grounded Text-to-image Generation

Jiayu Xiao, Henglei Lv, Henglei Lv et al.

ICLR 2024
27
citations
#75

Text-Conditioned Generative Model of 3D Strand-based Human Hairstyles

Vanessa Sklyarova, Egor Zakharov, Otmar Hilliges et al.

CVPR 2024
27
citations
#76

2382 SSMG: Spatial-Semantic Map Guided Diffusion Model for Free-Form Layout-to-Image Generation

Chengyou Jia, Minnan Luo, Zhuohang Dang et al.

AAAI 2024
26
citations
#77

PixWizard: Versatile Image-to-Image Visual Assistant with Open-Language Instructions

Weifeng Lin, Xinyu Wei, Renrui Zhang et al.

ICLR 2025
26
citations
#78

Perception-Guided Jailbreak Against Text-to-Image Models

Yihao Huang, Le Liang, Tianlin Li et al.

AAAI 2025
26
citations
#79

Enhancing Diffusion Models with Text-Encoder Reinforcement Learning

Chaofeng Chen, Annan Wang, Haoning Wu et al.

ECCV 2024
26
citations
#80

XVerse: Consistent Multi-Subject Control of Identity and Semantic Attributes via DiT Modulation

Bowen Chen, Brynn zhao, Haomiao Sun et al.

NeurIPS 2025
25
citations
#81

Doubly Abductive Counterfactual Inference for Text-based Image Editing

Xue Song, Jiequan Cui, Hanwang Zhang et al.

CVPR 2024
25
citations
#82

MagicQuill: An Intelligent Interactive Image Editing System

Zichen Liu, Yue Yu, Hao Ouyang et al.

CVPR 2025
25
citations
#83

Template Free Reconstruction of Human-object Interaction with Procedural Interaction Generation

Xianghui Xie, Bharat Lal Bhatnagar, Jan Lenssen et al.

CVPR 2024
24
citations
#84

Tailored Visions: Enhancing Text-to-Image Generation with Personalized Prompt Rewriting

Zijie Chen, Lichao Zhang, Fangsheng Weng et al.

CVPR 2024
24
citations
#85

AIGV-Assessor: Benchmarking and Evaluating the Perceptual Quality of Text-to-Video Generation with LMM

Wang Jiarui, Huiyu Duan, Guangtao Zhai et al.

CVPR 2025
24
citations
#86

JetFormer: An autoregressive generative model of raw images and text

Michael Tschannen, André Susano Pinto, Alexander Kolesnikov

ICLR 2025
23
citations
#87

Language-Guided Image Tokenization for Generation

Kaiwen Zha, Lijun Yu, Alireza Fathi et al.

CVPR 2025arXiv:2412.05796
image tokenizationlanguage-guided generationdiffusion transformertext-conditioned tokenization+4
23
citations
#88

Learning Disentangled Identifiers for Action-Customized Text-to-Image Generation

Siteng Huang, Biao Gong, Yutong Feng et al.

CVPR 2024
23
citations
#89

StegoGAN: Leveraging Steganography for Non-Bijective Image-to-Image Translation

Sidi Wu, Yizi Chen, Loic Landrieu et al.

CVPR 2024
22
citations
#90

Paper2Poster: Towards Multimodal Poster Automation from Scientific Papers

Wei Pang, Kevin Qinghong Lin, Xiangru Jian et al.

NeurIPS 2025
22
citations
#91

GlyphDraw2: Automatic Generation of Complex Glyph Posters with Diffusion Models and Large Language Models

Jian Ma, Yonglin Deng, Chen Chen et al.

AAAI 2025
22
citations
#92

StyleTokenizer: Defining Image Style by a Single Instance for Controlling Diffusion Models

Wen Li, Muyuan Fang, Cheng Zou et al.

ECCV 2024
22
citations
#93

ControlNet-XS: Rethinking the Control of Text-to-Image Diffusion Models as Feedback-Control Systems

Denis Zavadski, Johann-Friedrich Feiden, Carsten Rother

ECCV 2024
22
citations
#94

GigaTok: Scaling Visual Tokenizers to 3 Billion Parameters for Autoregressive Image Generation

Tianwei Xiong, Jun Hao Liew, Zilong Huang et al.

ICCV 2025
22
citations
#95

Text2LiDAR: Text-guided LiDAR Point Clouds Generation via Equirectangular Transformer

Yang Wu, Kaihua Zhang, Jianjun Qian et al.

ECCV 2024
22
citations
#96

POSTA: A Go-to Framework for Customized Artistic Poster Generation

Haoyu Chen, Xiaojie Xu, Wenbo Li et al.

CVPR 2025
21
citations
#97

Scalable Ranked Preference Optimization for Text-to-Image Generation

Shyamgopal Karthik, Huseyin Coskun, Zeynep Akata et al.

ICCV 2025
21
citations
#98

SongGen: A Single Stage Auto-regressive Transformer for Text-to-Song Generation

Zihan Liu, Shuangrui Ding, Zhixiong Zhang et al.

ICML 2025
21
citations
#99

OneIG-Bench: Omni-dimensional Nuanced Evaluation for Image Generation

Jingjing Chang, Yixiao Fang, Peng Xing et al.

NeurIPS 2025
21
citations
#100

Text-to-Image Generation for Abstract Concepts

Jiayi Liao, Xu Chen, Qiang Fu et al.

AAAI 2024arXiv:2309.14623
text-to-image generationabstract concept visualizationlarge language modelsprompt engineering+3
21
citations