Poster "learning rate warmup" Papers
3 papers found
What Does It Mean to Be a Transformer? Insights from a Theoretical Hessian Analysis
Weronika Ormaniec, Felix Dangel, Sidak Pal Singh
ICLR 2025posterarXiv:2410.10986
10
citations
Rotational Equilibrium: How Weight Decay Balances Learning Across Neural Networks
Atli Kosson, Bettina Messmer, Martin Jaggi
ICML 2024poster
When Will Gradient Regularization Be Harmful?
Yang Zhao, Hao Zhang, Xiuyuan Hu
ICML 2024poster