Most Cited 2025 Poster by Nolan Dey Papers
3 papers found
Conference
#1
Straight to Zero: Why Linearly Decaying the Learning Rate to Zero Works Best for LLMs
Shane Bergsma, Nolan Dey, Gurpreet Gosal et al.
ICLR 2025posterarXiv:2502.15938
22
citations
#2
Power Lines: Scaling laws for weight decay and batch size in LLM pre-training
Shane Bergsma, Nolan Dey, Gurpreet Gosal et al.
NEURIPS 2025posterarXiv:2505.13738
15
citations
#3
Don't be lazy: CompleteP enables compute-efficient deep transformers
Nolan Dey, Bin Zhang, Lorenzo Noci et al.
NEURIPS 2025posterarXiv:2505.01618