Convergence of Distributed Adaptive Optimization with Local Updates

3citations

arXiv:2409.13155

Citations

#1638

in ICLR 2025

of 3827 papers

Authors

Data Points

Authors

Ziheng Cheng Margalit Glasgow

Topics

distributed adaptive optimization local updates intermittent communication communication complexity convex optimization weakly convex settings gradient clipping generalized smoothness

Abstract

We study distributed adaptive algorithms with local updates (intermittent communication). Despite the great empirical success of adaptive methods in distributed training of modern machine learning models, the theoretical benefits of local updates within adaptive methods, particularly in terms of reducing communication complexity, have not been fully understood yet. In this paper, for the first time, we prove that \em Local SGD \em with momentum (\em Local \em SGDM) and \em Local \em Adam can outperform their minibatch counterparts in convex and weakly convex settings in certain regimes, respectively. Our analysis relies on a novel technique to prove contraction during local iterations, which is a crucial yet challenging step to show the advantages of local updates, under generalized smoothness assumption and gradient clipping strategy.

Citation History

Jan 25, 2026

Jan 27, 2026

Jan 31, 2026