ICLR 2025 "training stability" Papers
3 papers found
EDiT: A Local-SGD-Based Efficient Distributed Training Method for Large Language Models
Jialiang Cheng, Ning Gao, Yun Yue et al.
ICLR 2025posterarXiv:2412.07210
1
citations
Improving Neural Optimal Transport via Displacement Interpolation
Jaemoo Choi, Yongxin Chen, Jaewoong Choi
ICLR 2025posterarXiv:2410.03783
3
citations
PolyPythias: Stability and Outliers across Fifty Language Model Pre-Training Runs
Oskar van der Wal, Pietro Lesci, Max Müller-Eberstein et al.
ICLR 2025posterarXiv:2503.09543
14
citations