Asymptotic theory of SGD with a general learning-rate

0citations
0
Citations
#1830
in NeurIPS 2025
of 5858 papers
5
Authors
4
Data Points

Abstract

Stochastic gradient descent (SGD) with polynomially decaying step‐sizes has long underpinned theoretical analyses, yielding a broad spectrum of statistically attractive guarantees. Yet in practice, such schedules find rare use due to their prohibitively slow convergence, revealing a persistent gap between theory and empirical performance. In this paper, we introduce a unified framework that quantifies the uncertainty of online SGD under arbitrary learning‐rate choices. In particular, we provide the first comprehensive convergence characterizations for two widely used but theoretically under-examined schemes—cyclical learning rates and linear decay to zero. Our results not only explain the observed behavior of these schedules but also facilitate principled tools for statistical inference and algorithm design. All theoretical findings are corroborated by extensive simulations across diverse settings.

Citation History

Jan 25, 2026
0
Jan 27, 2026
0
Jan 27, 2026
0
Jan 28, 2026
0