Greedy Algorithms for Structured Bandits: A Sharp Characterization of Asymptotic Success / Failure

2citations

arXiv:2503.04010

citations

#2324

in NEURIPS 2025

of 5858 papers

Top Authors

Data Points

Top Authors

Aleksandrs Slivkins Yunzong Xu Shiliang Zuo

Abstract

We study the greedy (exploitation-only) algorithm in bandit problems with a known reward structure. We allow arbitrary finite reward structures, while prior work focused on a few specific ones. We fully characterize when the greedy algorithm asymptotically succeeds or fails, in the sense of sublinear vs. linear regret as a function of time. Our characterization identifies a partial identifiability property of the problem instance as the necessary and sufficient condition for the asymptotic success. Notably, once this property holds, the problem becomes easy—any algorithm will succeed (in the same sense as above), provided it satisfies a mild non-degeneracy condition. Our characterization extends to contextual bandits and interactive decision-making with arbitrary feedback. Examples demonstrating broad applicability and extensions to infinite reward structures are provided.

Citation History

Jan 25, 2026

Jan 26, 2026

Jan 28, 2026

Feb 13, 2026

2+2