2025 "language model training" Papers

14 papers found

Aioli: A Unified Optimization Framework for Language Model Data Mixing

Mayee Chen, Michael Hu, Nicholas Lourie et al.

ICLR 2025posterarXiv:2411.05735
16
citations

ASGO: Adaptive Structured Gradient Optimization

Kang An, Yuxing Liu, Rui Pan et al.

NEURIPS 2025posterarXiv:2503.20762
28
citations

Communication-Efficient Language Model Training Scales Reliably and Robustly: Scaling Laws for DiLoCo

Zachary Charles, Gabriel Teston, Lucio Dery et al.

NEURIPS 2025spotlightarXiv:2503.09799
12
citations

Critical Batch Size Revisited: A Simple Empirical Approach to Large-Batch Language Model Training

Will Merrill, Shane Arora, Dirk Groeneveld et al.

NEURIPS 2025spotlightarXiv:2505.23971
5
citations

Deconstructing What Makes a Good Optimizer for Autoregressive Language Models

Rosie Zhao, Depen Morwani, David Brandfonbrener et al.

ICLR 2025poster

EvoLM: In Search of Lost Language Model Training Dynamics

Zhenting Qi, Fan Nie, Alexandre Alahi et al.

NEURIPS 2025oralarXiv:2506.16029
3
citations

FedRW: Efficient Privacy-Preserving Data Reweighting for Enhancing Federated Learning of Language Models

Pukang Ye, Luo Junwei, Jiachen Shen et al.

NEURIPS 2025posterarXiv:2511.07505

Generative Representational Instruction Tuning

Niklas Muennighoff, Hongjin SU, Liang Wang et al.

ICLR 2025posterarXiv:2402.09906
214
citations

Gradient descent with generalized Newton’s method

Zhiqi Bu, Shiyun Xu

ICLR 2025posterarXiv:2407.02772
6
citations

Inverse Scaling: When Bigger Isn't Better

Joe Cavanagh, Andrew Gritsevskiy, Najoung Kim et al.

ICLR 2025posterarXiv:2306.09479
183
citations

Learning from negative feedback, or positive feedback or both

Abbas Abdolmaleki, Bilal Piot, Bobak Shahriari et al.

ICLR 2025posterarXiv:2410.04166
7
citations

Small Batch Size Training for Language Models: When Vanilla SGD Works, and Why Gradient Accumulation is Wasteful

Martin Marek, Sanae Lotfi, Aditya Somasundaram et al.

NEURIPS 2025posterarXiv:2507.07101
13
citations

Teaching Language Models to Reason with Tools

Chengpeng Li, Zhengyang Tang, Ziniu Li et al.

NEURIPS 2025posterarXiv:2510.20342
2
citations

Through the River: Understanding the Benefit of Schedule-Free Methods for Language Model Training

Minhak Song, Beomhan Baek, Kwangjun Ahn et al.

NEURIPS 2025posterarXiv:2507.09846
2
citations