2025 "batch size effects" Papers
2 papers found
How Two-Layer Neural Networks Learn, One (Giant) Step at a Time
Yatin Dandi, Florent Krzakala, Bruno Loureiro et al.
ICLR 2025posterarXiv:2305.18270
47
citations
Understanding the Generalization of Stochastic Gradient Adam in Learning Neural Networks
Xuan Tang, Han Zhang, Yuan Cao et al.
NeurIPS 2025posterarXiv:2510.11354