On the Convergence of Single-Timescale Actor-Critic

1citations

arXiv:2410.08868

Citations

#1216

in NeurIPS 2025

of 5858 papers

Authors

Data Points

Authors

Navdeep Kumar Priyank Agrawal Giorgia Ramponi Kfir Y. Levy Shie Mannor

Topics

actor-critic algorithm markov decision processes global convergence analysis single-timescale optimization gradient domination sample complexity step size decay

Abstract

We analyze the global convergence of the single-timescale actor-critic (AC) algorithm for the infinite-horizon discounted Markov Decision Processes (MDPs) with finite state spaces. To this end, we introduce an elegant analytical framework for handling complex, coupled recursions inherent in the algorithm. Leveraging this framework, we establish that the algorithm converges to an $ε$-close \textbf{globally optimal} policy with a sample complexity of $ O(ε^{-3}) $. This significantly improves upon the existing complexity of $O(ε^{-2})$ to achieve $ε$-close \textbf{stationary policy}, which is equivalent to the complexity of $O(ε^{-4})$ to achieve $ε$-close \textbf{globally optimal} policy using gradient domination lemma. Furthermore, we demonstrate that to achieve this improvement, the step sizes for both the actor and critic must decay as $ O(k^{-\frac{2}{3}}) $ with iteration $k$, diverging from the conventional $ O(k^{-\frac{1}{2}}) $ rates commonly used in (non)convex optimization.

Citation History

Jan 25, 2026

Jan 31, 2026