Beyond Implicit Bias: The Insignificance of SGD Noise in Online Learning

0citations

PDF

Citations

#10

in ICML 2024

of 2635 papers

Authors

Data Points

Authors

Nikhil Vyas Depen Morwani Rosie Zhao Gal Kaplun Sham Kakade Boaz Barak

Topics

implicit bias sgd noise online learning offline learning gradient descent batch size effects computational efficiency

Abstract

The success of SGD in deep learning has been ascribed by prior works to theimplicit biasinduced by finite batch sizes (''SGD noise''). While prior works focused onoffline learning(i.e., multiple-epoch training), we study the impact of SGD noise ononline(i.e., single epoch) learning. Through an extensive empirical analysis of image and language data, we demonstrate that small batch sizes donotconfer any implicit bias advantages in online learning. In contrast to offline learning, the benefits of SGD noise in online learning are strictly computational, facilitating more cost-effective gradient steps. This suggests that SGD in the online regime can be construed as taking noisy steps along the ''golden path'' of the noiselessgradient descentalgorithm. We study this hypothesis and provide supporting evidence in loss and function space. Our findings challenge the prevailing understanding of SGD and offer novel insights into its role in online learning.

Citation History

Jan 28, 2026