Poster "compute-optimal training" Papers
2 papers found
Language models scale reliably with over-training and on downstream tasks
Samir Yitzhak Gadre, Georgios Smyrnis, Vaishaal Shankar et al.
ICLR 2025posterarXiv:2403.08540
77
citations
Straight to Zero: Why Linearly Decaying the Learning Rate to Zero Works Best for LLMs
Shane Bergsma, Nolan Dey, Gurpreet Gosal et al.
ICLR 2025posterarXiv:2502.15938
22
citations