2025 Poster "language modeling" Papers

13 papers found

Accelerated Sampling from Masked Diffusion Models via Entropy Bounded Unmasking

Heli Ben-Hamu, Itai Gat, Daniel Severo et al.

NeurIPS 2025posterarXiv:2505.24857
40
citations

AdaFisher: Adaptive Second Order Optimization via Fisher Information

Damien GOMES, Yanlei Zhang, Eugene Belilovsky et al.

ICLR 2025posterarXiv:2405.16397
5
citations

Chunk-Distilled Language Modeling

Yanhong Li, Karen Livescu, Jiawei Zhou

ICLR 2025posterarXiv:2501.00343
3
citations

Continuous Diffusion Model for Language Modeling

Jaehyeong Jo, Sung Ju Hwang

NeurIPS 2025posterarXiv:2502.11564
4
citations

Differential Transformer

Tianzhu Ye, Li Dong, Yuqing Xia et al.

ICLR 2025posterarXiv:2410.05258

From Bytes to Ideas: Language Modeling with Autoregressive U-Nets

Mathurin VIDEAU, Badr Youbi Idrissi, Alessandro Leite et al.

NeurIPS 2025posterarXiv:2506.14761
5
citations

MIND over Body: Adaptive Thinking using Dynamic Computation

Mrinal Mathur, Barak Pearlmutter, Sergey Plis

ICLR 2025poster
2
citations

Nested Learning: The Illusion of Deep Learning Architectures

Ali Behrouz, Meisam Razaviyayn, Peilin Zhong et al.

NeurIPS 2025posterarXiv:2512.24695
12
citations

Next Semantic Scale Prediction via Hierarchical Diffusion Language Models

Cai Zhou, Chenyu Wang, Dinghuai Zhang et al.

NeurIPS 2025posterarXiv:2510.08632
3
citations

Selective Attention Improves Transformer

Yaniv Leviathan, Matan Kalman, Yossi Matias

ICLR 2025posterarXiv:2410.02703
20
citations

ShortListing Model: A Streamlined Simplex Diffusion for Discrete Variable Generation

Yuxuan Song, Zhe Zhang, Yu Pei et al.

NeurIPS 2025poster
1
citations

The AdEMAMix Optimizer: Better, Faster, Older

Matteo Pagliardini, Pierre Ablin, David Grangier

ICLR 2025posterarXiv:2409.03137
23
citations

Tight Clusters Make Specialized Experts

Stefan Nielsen, Rachel Teo, Laziz Abdullaev et al.

ICLR 2025posterarXiv:2502.15315
6
citations