2025 "llm pre-training" Papers
3 papers found
Mind the Gap: Examining the Self-Improvement Capabilities of Large Language Models
Yuda Song, Hanlin Zhang, Carson Eisenach et al.
ICLR 2025posterarXiv:2412.02674
Power Lines: Scaling laws for weight decay and batch size in LLM pre-training
Shane Bergsma, Nolan Dey, Gurpreet Gosal et al.
NeurIPS 2025posterarXiv:2505.13738
15
citations
SPAM: Spike-Aware Adam with Momentum Reset for Stable LLM Training
Tianjin Huang, Ziquan Zhu, Gaojie Jin et al.
ICLR 2025posterarXiv:2501.06842
15
citations