🧬Language Models

Text Generation

Generating coherent text outputs

100 papers5,956 total citations
Compare with other topics
Feb '24 Jan '26339 papers
Also includes: text generation, natural language generation, nlg, sequence generation

Top Papers

#1

Scaling Autoregressive Models for Content-Rich Text-to-Image Generation

Xin Li, Jing Yu Koh, Alexander Ku et al.

ICLR 2024
1,366
citations
#2

CogVideoX: Text-to-Video Diffusion Models with An Expert Transformer

Zhuoyi Yang, Jiayan Teng, Wendi Zheng et al.

ICLR 2025
1,318
citations
#3

ControlVideo: Training-free Controllable Text-to-video Generation

Yabo Zhang, Yuxiang Wei, Dongsheng jiang et al.

ICLR 2024
331
citations
#4

StreamingT2V: Consistent, Dynamic, and Extendable Long Video Generation from Text

Roberto Henschel, Levon Khachatryan, Hayk Poghosyan et al.

CVPR 2025
154
citations
#5

AnyText: Multilingual Visual Text Generation and Editing

Yuxiang Tuo, Wangmeng Xiang, Jun-Yan He et al.

ICLR 2024
121
citations
#6

MIGC: Multi-Instance Generation Controller for Text-to-Image Synthesis

Dewei Zhou, You Li, Fan Ma et al.

CVPR 2024
109
citations
#7

Freeze-Omni: A Smart and Low Latency Speech-to-speech Dialogue Model with Frozen LLM

Xiong Wang, Yangze Li, Chaoyou Fu et al.

ICML 2025
103
citations
#8

Learning Multi-Dimensional Human Preference for Text-to-Image Generation

Sixian Zhang, Bohan Wang, Junqiang Wu et al.

CVPR 2024
76
citations
#9

Paint-it: Text-to-Texture Synthesis via Deep Convolutional Texture Map Optimization and Physically-Based Rendering

Kim Youwang, Tae-Hyun Oh, Gerard Pons-Moll

CVPR 2024
75
citations
#10

Mega-TTS 2: Boosting Prompting Mechanisms for Zero-Shot Speech Synthesis

Ziyue Jiang, Jinglin Liu, Yi Ren et al.

ICLR 2024
74
citations
#11

Towards World Simulator: Crafting Physical Commonsense-Based Benchmark for Video Generation

Fanqing Meng, Jiaqi Liao, Xinyu Tan et al.

ICML 2025
72
citations
#12

ChatTime: A Unified Multimodal Time Series Foundation Model Bridging Numerical and Textual Data

Chengsen Wang, Qi Qi, Jingyu Wang et al.

AAAI 2025
72
citations
#13

From Audio to Photoreal Embodiment: Synthesizing Humans in Conversations

Evonne Ng, Javier Romero, Timur Bagautdinov et al.

CVPR 2024
71
citations
#14

PromptTTS 2: Describing and Generating Voices with Text Prompt

Yichong Leng, ZHifang Guo, Kai Shen et al.

ICLR 2024
70
citations
#15

Let's Think Outside the Box: Exploring Leap-of-Thought in Large Language Models with Creative Humor Generation

Shanshan Zhong, Zhongzhan Huang, Shanghua Gao et al.

CVPR 2024
65
citations
#16

TC4D: Trajectory-Conditioned Text-to-4D Generation

Sherwin Bahmani, Xian Liu, Wang Yifan et al.

ECCV 2024arXiv:2403.17920
text-to-4d generationtrajectory-conditioned generationdynamic 3d scenesneural representations+4
64
citations
#17

Text Is MASS: Modeling as Stochastic Embedding for Text-Video Retrieval

Jiamian Wang, Guohao Sun, Pichao Wang et al.

CVPR 2024
63
citations
#18

DiffSHEG: A Diffusion-Based Approach for Real-Time Speech-driven Holistic 3D Expression and Gesture Generation

Junming Chen, Yunfei Liu, Jianan Wang et al.

CVPR 2024
62
citations
#19

Text2HOI: Text-guided 3D Motion Generation for Hand-Object Interaction

Junuk Cha, Jihyeon Kim, Jae Shin Yoon et al.

CVPR 2024
58
citations
#20

Hierarchical Spatio-temporal Decoupling for Text-to-Video Generation

Zhiwu Qing, Shiwei Zhang, Jiayu Wang et al.

CVPR 2024
55
citations
#21

Watermarking Conditional Text Generation for AI Detection: Unveiling Challenges and a Semantic-Aware Watermark Remedy

Yu Fu, Deyi Xiong, Yue Dong

AAAI 2024arXiv:2307.13808
watermarking text generationai detectionconditional text generationsemantic-aware watermarking+4
54
citations
#22

A Recipe for Scaling up Text-to-Video Generation with Text-free Videos

Xiang Wang, Shiwei Zhang, Hangjie Yuan et al.

CVPR 2024
53
citations
#23

GVGEN: Text-to-3D Generation with Volumetric Representation

Xianglong He, Junyi Chen, Sida Peng et al.

ECCV 2024arXiv:2403.12957
3d gaussian splattingvolumetric representationtext-to-3d generationdiffusion-based framework+3
51
citations
#24

Consistent3D: Towards Consistent High-Fidelity Text-to-3D Generation with Deterministic Sampling Prior

Zike Wu, Pan Zhou, YI Xuanyu et al.

CVPR 2024
51
citations
#25

UDiffText: A Unified Framework for High-quality Text Synthesis in Arbitrary Images via Character-aware Diffusion Models

Yiming Zhao, Zhouhui Lian

ECCV 2024
47
citations
#26

Style2Talker: High-Resolution Talking Head Generation with Emotion Style and Art Style

Shuai Tan, Bin Ji, Ye Pan

AAAI 2024arXiv:2403.06365
talking head generationemotion style transferart style transferaudio-driven animation+4
43
citations
#27

ParCo: Part-Coordinating Text-to-Motion Synthesis

Qiran Zou, Shangyuan Yuan, Shian Du et al.

ECCV 2024
43
citations
#28

Few-Shot Detection of Machine-Generated Text using Style Representations

Rafael Rivera Soto, Kailin Koch, Aleem Khan et al.

ICLR 2024
41
citations
#29

BAMM: Bidirectional Autoregressive Motion Model

Ekkasit Pinyoanuntapong, Muhammad Usama Saleem, Pu Wang et al.

ECCV 2024
41
citations
#30

TestGenEval: A Real World Unit Test Generation and Test Completion Benchmark

Kush Jain, Gabriel Synnaeve, Baptiste Roziere

ICLR 2025
39
citations
#31

Agents' Room: Narrative Generation through Multi-step Collaboration

Fantine Huot, Reinald Kim Amplayo, Jennimaria Palomaki et al.

ICLR 2025
38
citations
#32

DartControl: A Diffusion-Based Autoregressive Motion Model for Real-Time Text-Driven Motion Control

Kaifeng Zhao, Gen Li, Siyu Tang

ICLR 2025
36
citations
#33

Control4D: Efficient 4D Portrait Editing with Text

Ruizhi Shao, Jingxiang Sun, Cheng Peng et al.

CVPR 2024
36
citations
#34

FreeMotion: A Unified Framework for Number-free Text-to-Motion Synthesis

Ke Fan, Junshu Tang, Weijian Cao et al.

ECCV 2024
34
citations
#35

MAGIC: Generating Self-Correction Guideline for In-Context Text-to-SQL

Arian Askari, Christian Poelitz, Xinye Tang

AAAI 2025
33
citations
#36

Make It Count: Text-to-Image Generation with an Accurate Number of Objects

Lital Binyamin, Yoad Tewel, Hilit Segev et al.

CVPR 2025
32
citations
#37

AI as Humanity’s Salieri: Quantifying Linguistic Creativity of Language Models via Systematic Attribution of Machine Text against Web Text

Ximing Lu, Melanie Sclar, Skyler Hallinan et al.

ICLR 2025arXiv:2410.04265
linguistic creativity quantificationmachine text detectionlarge language modelsweb text attribution+3
32
citations
#38

Motion-Agent: A Conversational Framework for Human Motion Generation with LLMs

Qi Wu, Yubo Zhao, Yifan Wang et al.

ICLR 2025arXiv:2405.17013
human motion generationconversational frameworkmotion editingmotion understanding+4
30
citations
#39

One-Prompt-One-Story: Free-Lunch Consistent Text-to-Image Generation Using a Single Prompt

Tao Liu, Kai Wang, Senmao Li et al.

ICLR 2025
30
citations
#40

StarVector: Generating Scalable Vector Graphics Code from Images and Text

Juan Rodriguez, Abhay Puri, Shubham Agarwal et al.

CVPR 2025arXiv:2312.11556
svg generationmultimodal language modelsimage vectorizationvector graphics code+4
30
citations
#41

Let Them Talk: Audio-Driven Multi-Person Conversational Video Generation

Zhe Kong, Feng Gao, Yong Zhang et al.

NeurIPS 2025arXiv:2505.22647
audio-driven human animationtalking head generationtalking body generationmulti-person video generation+3
30
citations
#42

Text-Conditioned Generative Model of 3D Strand-based Human Hairstyles

Vanessa Sklyarova, Egor Zakharov, Otmar Hilliges et al.

CVPR 2024
27
citations
#43

XVerse: Consistent Multi-Subject Control of Identity and Semantic Attributes via DiT Modulation

Bowen Chen, Brynn zhao, Haomiao Sun et al.

NeurIPS 2025
25
citations
#44

EgoGen: An Egocentric Synthetic Data Generator

Gen Li, Kaifeng Zhao, Siwei Zhang et al.

CVPR 2024
24
citations
#45

Tailored Visions: Enhancing Text-to-Image Generation with Personalized Prompt Rewriting

Zijie Chen, Lichao Zhang, Fangsheng Weng et al.

CVPR 2024
24
citations
#46

Learning Disentangled Identifiers for Action-Customized Text-to-Image Generation

Siteng Huang, Biao Gong, Yutong Feng et al.

CVPR 2024
23
citations
#47

GlyphDraw2: Automatic Generation of Complex Glyph Posters with Diffusion Models and Large Language Models

Jian Ma, Yonglin Deng, Chen Chen et al.

AAAI 2025
22
citations
#48

StyleTokenizer: Defining Image Style by a Single Instance for Controlling Diffusion Models

Wen Li, Muyuan Fang, Cheng Zou et al.

ECCV 2024
22
citations
#49

3D-SceneDreamer: Text-Driven 3D-Consistent Scene Generation

Songchun Zhang, Yibo Zhang, Quan Zheng et al.

CVPR 2024
21
citations
#50

FonTS: Text Rendering With Typography and Style Controls

Wenda SHI, Yiren Song, Dengming Zhang et al.

ICCV 2025
21
citations
#51

PosterMaker: Towards High-Quality Product Poster Generation with Accurate Text Rendering

Yifan Gao, Zihang Lin, Chuanbin Liu et al.

CVPR 2025
21
citations
#52

SongGen: A Single Stage Auto-regressive Transformer for Text-to-Song Generation

Zihan Liu, Shuangrui Ding, Zhixiong Zhang et al.

ICML 2025
21
citations
#53

Grid Diffusion Models for Text-to-Video Generation

Taegyeong Lee, Soyeong Kwon, Taehwan Kim

CVPR 2024
20
citations
#54

Towards Open Domain Text-Driven Synthesis of Multi-Person Motions

Shan Mengyi, Lu Dong, Yutao Han et al.

ECCV 2024
20
citations
#55

STIV: Scalable Text and Image Conditioned Video Generation

Zongyu Lin, Wei Liu, Chen Chen et al.

ICCV 2025
20
citations
#56

InstructAvatar: Text-Guided Emotion and Motion Control for Avatar Generation

Yuchi Wang, Junliang Guo, Jianhong Bai et al.

AAAI 2025
19
citations
#57

MoonCast: High-Quality Zero-Shot Podcast Generation

Zeqian Ju, Dongchao Yang, Shen Kai et al.

NeurIPS 2025arXiv:2503.14345
zero-shot generationtext-to-speech synthesislong-context audio modelingpodcast generation+3
18
citations
#58

FaceChain-ImagineID: Freely Crafting High-Fidelity Diverse Talking Faces from Disentangled Audio

Chao Xu, Yang Liu, Jiazheng Xing et al.

CVPR 2024
18
citations
#59

Diverse Person: Customize Your Own Dataset for Text-Based Person Search

Zifan Song, Guosheng Hu, Cairong Zhao

AAAI 2024
18
citations
#60

OpenING: A Comprehensive Benchmark for Judging Open-ended Interleaved Image-Text Generation

Pengfei Zhou, Xiaopeng Peng, Jiajun Song et al.

CVPR 2025
18
citations
#61

De-Diffusion Makes Text a Strong Cross-Modal Interface

Chen Wei, Chenxi Liu, Siyuan Qiao et al.

CVPR 2024
17
citations
#62

Customization Assistant for Text-to-Image Generation

Yufan Zhou, Ruiyi Zhang, Jiuxiang Gu et al.

CVPR 2024
14
citations
#63

MERGE: Fast Private Text Generation

Zi Liang, Pinghui Wang, Ruofei Zhang et al.

AAAI 2024arXiv:2305.15769
private inferencetransformer-based modelsnatural language generationcloud model deployment+4
14
citations
#64

Generating Multi-Image Synthetic Data for Text-to-Image Customization

Nupur Kumari, Xi Yin, Jun-Yan Zhu et al.

ICCV 2025
14
citations
#65

Roll the dice & look before you leap: Going beyond the creative limits of next-token prediction

Vaishnavh Nagarajan, Chen Wu, Charles Ding et al.

ICML 2025
13
citations
#66

Generative Photography: Scene-Consistent Camera Control for Realistic Text-to-Image Synthesis

Yu Yuan, Xijun Wang, Yichen Sheng et al.

CVPR 2025arXiv:2412.02168
text-to-image synthesiscamera controlscene consistencydimensionality lifting+3
13
citations
#67

UniMuMo: Unified Text, Music, and Motion Generation

Han Yang, Kun Su, Yutong Zhang et al.

AAAI 2025
12
citations
#68

VinTAGe: Joint Video and Text Conditioning for Holistic Audio Generation

Saksham Singh Kushwaha, Yapeng Tian

CVPR 2025
12
citations
#69

How to Synthesize Text Data without Model Collapse?

Xuekai Zhu, Daixuan Cheng, Hengli Li et al.

ICML 2025
12
citations
#70

DiffusionPen: Towards Controlling the Style of Handwritten Text Generation

KONSTANTINA NIKOLAIDOU, George Retsinas, Giorgos Sfikas et al.

ECCV 2024
11
citations
#71

BizGen: Advancing Article-level Visual Text Rendering for Infographics Generation

Yuyang Peng, Shishi Xiao, Keming Wu et al.

CVPR 2025arXiv:2503.20672
visual text renderinginfographics generationlayout-guided attentionbusiness content generation+4
10
citations
#72

Light-T2M: A Lightweight and Fast Model for Text-to-motion Generation

Ling-An Zeng, Guohong Huang, Gaojie Wu et al.

AAAI 2025
10
citations
#73

CAD-Coder: Text-to-CAD Generation with Chain-of-Thought and Geometric Reward

Yandong Guan, Xilin Wang, XiMing Xing et al.

NeurIPS 2025
9
citations
#74

DeepCalliFont: Few-Shot Chinese Calligraphy Font Synthesis by Integrating Dual-Modality Generative Models

Yitian Liu, Zhouhui Lian

AAAI 2024arXiv:2312.10314
few-shot font generationchinese calligraphy synthesisdual-modality generative modelsglyph image synthesis+4
9
citations
#75

EmergentTTS-Eval: Evaluating TTS Models on Complex Prosodic, Expressiveness, and Linguistic Challenges Using Model-as-a-Judge

Ruskin Raj Manku, Yuzhi Tang, Xingjian Shi et al.

NeurIPS 2025
9
citations
#76

DreamText: High Fidelity Scene Text Synthesis

Yibin Wang, Weizhong Zhang, honghui xu et al.

CVPR 2025
9
citations
#77

A Unified and Interpretable Emotion Representation and Expression Generation

Reni Paskaleva, Mykyta Holubakha, Andela Ilic et al.

CVPR 2024
9
citations
#78

C3Net: Compound Conditioned ControlNet for Multimodal Content Generation

Juntao Zhang, Yuehuai LIU, Yu-Wing Tai et al.

CVPR 2024
9
citations
#79

FixTalk: Taming Identity Leakage for High-Quality Talking Head Generation in Extreme Cases

Shuai Tan, Bill Gong, Bin Ji et al.

ICCV 2025
9
citations
#80

CriSPO: Multi-Aspect Critique-Suggestion-guided Automatic Prompt Optimization for Text Generation

Han He, Qianchu Liu, Lei Xu et al.

AAAI 2025
9
citations
#81

TextField3D: Towards Enhancing Open-Vocabulary 3D Generation with Noisy Text Fields

Tianyu Huang, Yihan Zeng, Bowen Dong et al.

ICLR 2024
9
citations
#82

ROUTE: Robust Multitask Tuning and Collaboration for Text-to-SQL

Yang Qin, Chao Chen, Zhihang Fu et al.

ICLR 2025
8
citations
#83

Plan, Posture and Go: Towards Open-vocabulary Text-to-Motion Generation

Jinpeng Liu, Wenxun Dai, Chunyu Wang et al.

ECCV 2024
8
citations
#84

Generating Physically Stable and Buildable Brick Structures from Text

Ava Pun, Kangle Deng, Ruixuan Liu et al.

ICCV 2025arXiv:2505.05469
text-to-3d generationbrick assembly modelingphysical stability validationautoregressive language models+3
8
citations
#85

SIG: Speaker Identification in Literature via Prompt-Based Generation

Zhenlin Su, Liyan Xu, Jin Xu et al.

AAAI 2024arXiv:2312.14590
speaker identificationprompt-based generationliterary analysisout-of-domain inference+3
8
citations
#86

AMO Sampler: Enhancing Text Rendering with Overshooting

Xixi Hu, Keyang Xu, Bo Liu et al.

CVPR 2025arXiv:2411.19415
text-to-image generationtext renderingrectified flow modelsovershooting sampler+3
8
citations
#87

GenesisTex2: Stable, Consistent and High-Quality Text-to-Texture Generation

Jiawei Lu, YingPeng Zhang, Zengjun Zhao et al.

AAAI 2025
7
citations
#88

Generating Illustrated Instructions

Sachit Menon, Ishan Misra, Rohit Girdhar

CVPR 2024
7
citations
#89

Hand1000: Generating Realistic Hands from Text with Only 1,000 Images

Haozhuo Zhang, Bin Zhu, Yu Cao et al.

AAAI 2025
7
citations
#90

YOLO-Count: Differentiable Object Counting for Text-to-Image Generation

Guanning Zeng, Xiang Zhang, Zirui Wang et al.

ICCV 2025
6
citations
#91

Error Norm Truncation: Robust Training in the Presence of Data Noise for Text Generation Models

Tianjian Li, Haoran Xu, Philipp Koehn et al.

ICLR 2024
6
citations
#92

BBScore: A Brownian Bridge Based Metric for Assessing Text Coherence

Zhecheng Sheng, Tianhao Zhang, Chen Jiang et al.

AAAI 2024arXiv:2312.16893
text coherence assessmentbrownian bridge theoryneural coherence modelingreference-free metric+4
6
citations
#93

Decoupling Layout from Glyph in Online Chinese Handwriting Generation

Minsi Ren, Yan-Ming Zhang, yi chen

ICLR 2025
5
citations
#94

Lay2Story: Extending Diffusion Transformers for Layout-Togglable Story Generation

Ao Ma, Jiasong Feng, Ke Cao et al.

ICCV 2025
5
citations
#95

Storynizor: Consistent Story Generation via Inter-Frame Synchronized and Shuffled ID Injection

Yuhang Ma, Wenting Xu, Chaoyi Zhao et al.

AAAI 2025
5
citations
#96

MetaDesigner: Advancing Artistic Typography through AI-Driven, User-Centric, and Multilingual WordArt Synthesis

Jun-Yan He, Zhi-Qi Cheng, Chenyang Li et al.

ICLR 2025
5
citations
#97

AniMo: Species-Aware Model for Text-Driven Animal Motion Generation

Xuan Wang, Kai Ruan, Xing Zhang et al.

CVPR 2025
5
citations
#98

LaTexBlend: Scaling Multi-concept Customized Generation with Latent Textual Blending

Jian Jin, Zhenbo Yu, Yang Shen et al.

CVPR 2025
5
citations
#99

Bayesian WeakS-to-Strong from Text Classification to Generation

Ziyun Cui, Ziyang Zhang, Guangzhi Sun et al.

ICLR 2025
5
citations
#100

Prot2Text-V2: Protein Function Prediction with Multimodal Contrastive Alignment

Xiao Fei, Michail Chatzianastasis, Sarah Carneiro et al.

NeurIPS 2025
4
citations