2025 "language modeling" Papers
13 papers found
Accelerated Sampling from Masked Diffusion Models via Entropy Bounded Unmasking
Heli Ben-Hamu, Itai Gat, Daniel Severo et al.
NeurIPS 2025posterarXiv:2505.24857
40
citations
AdaFisher: Adaptive Second Order Optimization via Fisher Information
Damien GOMES, Yanlei Zhang, Eugene Belilovsky et al.
ICLR 2025posterarXiv:2405.16397
5
citations
Chunk-Distilled Language Modeling
Yanhong Li, Karen Livescu, Jiawei Zhou
ICLR 2025posterarXiv:2501.00343
3
citations
Continuous Diffusion Model for Language Modeling
Jaehyeong Jo, Sung Ju Hwang
NeurIPS 2025posterarXiv:2502.11564
4
citations
Differential Transformer
Tianzhu Ye, Li Dong, Yuqing Xia et al.
ICLR 2025posterarXiv:2410.05258
From Bytes to Ideas: Language Modeling with Autoregressive U-Nets
Mathurin VIDEAU, Badr Youbi Idrissi, Alessandro Leite et al.
NeurIPS 2025posterarXiv:2506.14761
5
citations
Nested Learning: The Illusion of Deep Learning Architectures
Ali Behrouz, Meisam Razaviyayn, Peilin Zhong et al.
NeurIPS 2025posterarXiv:2512.24695
12
citations
Next Semantic Scale Prediction via Hierarchical Diffusion Language Models
Cai Zhou, Chenyu Wang, Dinghuai Zhang et al.
NeurIPS 2025posterarXiv:2510.08632
3
citations
Scaling up Masked Diffusion Models on Text
Shen Nie, Fengqi Zhu, Chao Du et al.
ICLR 2025oralarXiv:2410.18514
110
citations
Selective Attention Improves Transformer
Yaniv Leviathan, Matan Kalman, Yossi Matias
ICLR 2025posterarXiv:2410.02703
20
citations
ShortListing Model: A Streamlined Simplex Diffusion for Discrete Variable Generation
Yuxuan Song, Zhe Zhang, Yu Pei et al.
NeurIPS 2025poster
1
citations
The AdEMAMix Optimizer: Better, Faster, Older
Matteo Pagliardini, Pierre Ablin, David Grangier
ICLR 2025posterarXiv:2409.03137
23
citations
Tight Clusters Make Specialized Experts
Stefan Nielsen, Rachel Teo, Laziz Abdullaev et al.
ICLR 2025posterarXiv:2502.15315
6
citations