How to Explore with Belief: State Entropy Maximization in POMDPs

0citations

Citations

Authors

Data Points

Authors

Riccardo Zamboni Duilio Cirino Marcello Restelli Mirco Mutti

Topics

state entropy maximization partial observability belief states policy gradient methods pomdps hallucination problem optimization landscape

Abstract

Recent works have studiedstate entropy maximizationin reinforcement learning, in which the agent's objective is to learn a policy inducing high entropy over states visitation (Hazan et al., 2019). They typically assume full observability of the state of the system, so that the entropy of the observations is maximized. In practice, the agent may only getpartialobservations, e.g., a robot perceiving the state of a physical space through proximity sensors and cameras. A significant mismatch between the entropy over observations and true states of the system can arise in those settings. In this paper, we address the problem of entropy maximization over thetrue stateswith a decision policy conditioned on partial observationsonly. The latter is a generalization of POMDPs, which is intractable in general. We develop a memory and computationally efficientpolicy gradientmethod to address a first-order relaxation of the objective defined onbeliefstates, providing various formal characterizations of approximation gaps, the optimization landscape, and thehallucinationproblem. This paper aims to generalize state entropy maximization to more realistic domains that meet the challenges of applications.

Citation History

Jan 28, 2026