Poster by Nolan Dey Papers
3 papers found
Don't be lazy: CompleteP enables compute-efficient deep transformers
Nolan Dey, Bin Zhang, Lorenzo Noci et al.
NeurIPS 2025poster
Power Lines: Scaling laws for weight decay and batch size in LLM pre-training
Shane Bergsma, Nolan Dey, Gurpreet Gosal et al.
NeurIPS 2025posterarXiv:2505.13738
15
citations
Straight to Zero: Why Linearly Decaying the Learning Rate to Zero Works Best for LLMs
Shane Bergsma, Nolan Dey, Gurpreet Gosal et al.
ICLR 2025poster