2025 "large batch training" Papers
3 papers found
AdaGrad under Anisotropic Smoothness
Yuxing Liu, Rui Pan, Tong Zhang
ICLR 2025posterarXiv:2406.15244
14
citations
Through the River: Understanding the Benefit of Schedule-Free Methods for Language Model Training
Minhak Song, Beomhan Baek, Kwangjun Ahn et al.
NEURIPS 2025posterarXiv:2507.09846
2
citations
Understanding outer learning rates in Local SGD
Ahmed Khaled, Satyen Kale, Arthur Douillard et al.
NEURIPS 2025poster