2025 "language modeling" Papers

13 papers found

Filters:2025 language modeling Clear all

Conference

AAAI 2025 (3,028)CVPR 2025 (2,873)ICCV 2025 (2,701)ICLR 2025 (3,827)ICML 2025 (3,340)ISMAR 2025 (229)NeurIPS 2025 (5,858)AAAI 2024 (2,289)CVPR 2024 (2,716)ECCV 2024 (2,387)ICLR 2024 (2,297)ICML 2024 (2,635)

Paper Type

poster (24,624)paper (8,140)oral (1,594)spotlight (1,421)highlight (975)

Accelerated Sampling from Masked Diffusion Models via Entropy Bounded Unmasking

Heli Ben-Hamu, Itai Gat, Daniel Severo et al.

NeurIPS 2025posterarXiv:2505.24857

citations

AdaFisher: Adaptive Second Order Optimization via Fisher Information

Damien GOMES, Yanlei Zhang, Eugene Belilovsky et al.

ICLR 2025posterarXiv:2405.16397

citations

Chunk-Distilled Language Modeling

Yanhong Li, Karen Livescu, Jiawei Zhou

ICLR 2025posterarXiv:2501.00343

citations

Continuous Diffusion Model for Language Modeling

Jaehyeong Jo, Sung Ju Hwang

NeurIPS 2025posterarXiv:2502.11564

citations

Differential Transformer

Tianzhu Ye, Li Dong, Yuqing Xia et al.

ICLR 2025posterarXiv:2410.05258

From Bytes to Ideas: Language Modeling with Autoregressive U-Nets

Mathurin VIDEAU, Badr Youbi Idrissi, Alessandro Leite et al.

NeurIPS 2025posterarXiv:2506.14761

citations

Nested Learning: The Illusion of Deep Learning Architectures

Ali Behrouz, Meisam Razaviyayn, Peilin Zhong et al.

NeurIPS 2025posterarXiv:2512.24695

citations

Next Semantic Scale Prediction via Hierarchical Diffusion Language Models

Cai Zhou, Chenyu Wang, Dinghuai Zhang et al.

NeurIPS 2025posterarXiv:2510.08632

citations

Scaling up Masked Diffusion Models on Text

Shen Nie, Fengqi Zhu, Chao Du et al.

ICLR 2025oralarXiv:2410.18514

110

citations

Selective Attention Improves Transformer

Yaniv Leviathan, Matan Kalman, Yossi Matias

ICLR 2025posterarXiv:2410.02703

citations

ShortListing Model: A Streamlined Simplex Diffusion for Discrete Variable Generation

Yuxuan Song, Zhe Zhang, Yu Pei et al.

NeurIPS 2025poster

citations

The AdEMAMix Optimizer: Better, Faster, Older

Matteo Pagliardini, Pierre Ablin, David Grangier

ICLR 2025posterarXiv:2409.03137

citations

Tight Clusters Make Specialized Experts

Stefan Nielsen, Rachel Teo, Laziz Abdullaev et al.

ICLR 2025posterarXiv:2502.15315

citations