"llm pre-training" Papers
2 papers found
Mind the Gap: Examining the Self-Improvement Capabilities of Large Language Models
Yuda Song, Hanlin Zhang, Carson Eisenach et al.
ICLR 2025posterarXiv:2412.02674
Power Lines: Scaling laws for weight decay and batch size in LLM pre-training
Shane Bergsma, Nolan Dey, Gurpreet Gosal et al.
NeurIPS 2025posterarXiv:2505.13738
15
citations