High-Probability Bound for Non-Smooth Non-Convex Stochastic Optimization with Heavy Tails

0citations
PDF
0
Citations
#10
in ICML 2024
of 2635 papers
3
Authors
1
Data Points

Abstract

Recently, Cutkosky et al. introduce the online-to-non-convex framework, which utilizes online learning methods to solve non-smooth non-convex optimization problems, and achieves an $\mathcal{O}(\epsilon^{-3}\delta^{-1})$ gradient complexity for finding $(\delta,\epsilon)$-stationary points. However, their results rely on the bounded variance assumption of stochastic gradients and only hold in expectation. To address these limitations, we investigate the case that stochastic gradients obey heavy-tailed distributions with finite $\mathfrak{p}$-th moments for some $\mathfrak{p}\in(1,2]$, and propose a novel algorithm which is able to identify a $(\delta,\epsilon)$-stationary point with high probability, after consuming $\tilde{\mathcal{O}}(\epsilon^{-\frac{2\mathfrak{p}-1}{\mathfrak{p}-1}}\delta^{-1})$ stochastic gradients. The key idea is first incorporating the gradient clipping technique into the online-to-non-convex framework to produce a sequence of points, the averaged gradient norms of which is no greater than $\epsilon$. Then, we propose a validation method to select one $(\delta,\epsilon)$-stationary point among the candidates. When gradient distributions have bounded variance, i.e., $\mathfrak{p}=2$, our result turns into $\tilde{\mathcal{O}}(\epsilon^{-3}\delta^{-1})$, which improves the existing $\tilde{\mathcal{O}}(\epsilon^{-4}\delta^{-1})$ high-probability bound. When the objective is smooth, our algorithm can also find an $\epsilon$-stationary point with $\tilde{\mathcal{O}}(\epsilon^{-\frac{3\mathfrak{p}-2}{\mathfrak{p}-1}})$ gradient queries.

Citation History

Jan 28, 2026
0