Best of Both Worlds: Regret Minimization versus Minimax Play

0citations

Citations

#766

in ICML 2025

of 3340 papers

Authors

Data Points

Authors

Adrian Müller Jon Schneider EFSTRATIOS PANTELEIMON SKOULAKIS Luca Viano Volkan Cevher

Abstract

In this paper, we investigate the existence of online learning algorithms with bandit feedback that simultaneously guarantee $O(1)$ regret compared to a given comparator strategy, and $\tilde{O}(\sqrt{T})$ regret compared to any fixed strategy, where $T$ is the number of rounds. We provide the first affirmative answer to this question whenever the comparator strategy supports every action. In the context of zero-sum games with min-max value zero, both in normal- and extensive form, we show that our results allow us to guarantee to risk at most $O(1)$ loss while being able to gain $\Omega(T)$ from exploitable opponents, thereby combining the benefits of both no-regret algorithms and minimax play.

Citation History

Jan 28, 2026