ICLR "language modeling" Papers
10 papers found
AdaFisher: Adaptive Second Order Optimization via Fisher Information
Damien GOMES, Yanlei Zhang, Eugene Belilovsky et al.
ICLR 2025posterarXiv:2405.16397
5
citations
Chunk-Distilled Language Modeling
Yanhong Li, Karen Livescu, Jiawei Zhou
ICLR 2025posterarXiv:2501.00343
3
citations
Differential Transformer
Tianzhu Ye, Li Dong, Yuqing Xia et al.
ICLR 2025posterarXiv:2410.05258
Glauber Generative Model: Discrete Diffusion Models via Binary Classification
Harshit Varma, Dheeraj Nagaraj, Karthikeyan Shanmugam
ICLR 2025posterarXiv:2405.17035
7
citations
Language Models Are Implicitly Continuous
Samuele Marro, Davide Evangelista, X. Huang et al.
ICLR 2025posterarXiv:2504.03933
3
citations
MIND over Body: Adaptive Thinking using Dynamic Computation
Mrinal Mathur, Barak Pearlmutter, Sergey Plis
ICLR 2025poster
2
citations
Scaling up Masked Diffusion Models on Text
Shen Nie, Fengqi Zhu, Chao Du et al.
ICLR 2025oralarXiv:2410.18514
110
citations
Selective Attention Improves Transformer
Yaniv Leviathan, Matan Kalman, Yossi Matias
ICLR 2025posterarXiv:2410.02703
20
citations
The AdEMAMix Optimizer: Better, Faster, Older
Matteo Pagliardini, Pierre Ablin, David Grangier
ICLR 2025posterarXiv:2409.03137
23
citations
Tight Clusters Make Specialized Experts
Stefan Nielsen, Rachel Teo, Laziz Abdullaev et al.
ICLR 2025posterarXiv:2502.15315
6
citations