2025 "language model training" Papers
14 papers found
Aioli: A Unified Optimization Framework for Language Model Data Mixing
Mayee Chen, Michael Hu, Nicholas Lourie et al.
ICLR 2025posterarXiv:2411.05735
16
citations
ASGO: Adaptive Structured Gradient Optimization
Kang An, Yuxing Liu, Rui Pan et al.
NEURIPS 2025posterarXiv:2503.20762
28
citations
Communication-Efficient Language Model Training Scales Reliably and Robustly: Scaling Laws for DiLoCo
Zachary Charles, Gabriel Teston, Lucio Dery et al.
NEURIPS 2025spotlightarXiv:2503.09799
12
citations
Critical Batch Size Revisited: A Simple Empirical Approach to Large-Batch Language Model Training
Will Merrill, Shane Arora, Dirk Groeneveld et al.
NEURIPS 2025spotlightarXiv:2505.23971
5
citations
Deconstructing What Makes a Good Optimizer for Autoregressive Language Models
Rosie Zhao, Depen Morwani, David Brandfonbrener et al.
ICLR 2025poster
EvoLM: In Search of Lost Language Model Training Dynamics
Zhenting Qi, Fan Nie, Alexandre Alahi et al.
NEURIPS 2025oralarXiv:2506.16029
3
citations
FedRW: Efficient Privacy-Preserving Data Reweighting for Enhancing Federated Learning of Language Models
Pukang Ye, Luo Junwei, Jiachen Shen et al.
NEURIPS 2025posterarXiv:2511.07505
Generative Representational Instruction Tuning
Niklas Muennighoff, Hongjin SU, Liang Wang et al.
ICLR 2025posterarXiv:2402.09906
214
citations
Gradient descent with generalized Newton’s method
Zhiqi Bu, Shiyun Xu
ICLR 2025posterarXiv:2407.02772
6
citations
Inverse Scaling: When Bigger Isn't Better
Joe Cavanagh, Andrew Gritsevskiy, Najoung Kim et al.
ICLR 2025posterarXiv:2306.09479
183
citations
Learning from negative feedback, or positive feedback or both
Abbas Abdolmaleki, Bilal Piot, Bobak Shahriari et al.
ICLR 2025posterarXiv:2410.04166
7
citations
Small Batch Size Training for Language Models: When Vanilla SGD Works, and Why Gradient Accumulation is Wasteful
Martin Marek, Sanae Lotfi, Aditya Somasundaram et al.
NEURIPS 2025posterarXiv:2507.07101
13
citations
Teaching Language Models to Reason with Tools
Chengpeng Li, Zhengyang Tang, Ziniu Li et al.
NEURIPS 2025posterarXiv:2510.20342
2
citations
Through the River: Understanding the Benefit of Schedule-Free Methods for Language Model Training
Minhak Song, Beomhan Baek, Kwangjun Ahn et al.
NEURIPS 2025posterarXiv:2507.09846
2
citations