Hybrid Reinforcement Learning from Offline Observation Alone

0citations

arXiv:2406.07253 PDF Project

Citations

#10

in ICML 2024

of 2635 papers

Authors

Data Points

Authors

Yuda Song J. Bagnell Aarti Singh

Topics

hybrid reinforcement learning offline observation data reset model trace model admissibility assumption policy coverage offline dataset online interactive access

Abstract

We consider the hybrid reinforcement learning setting where the agent has access to both offline data and online interactive access. While RL research typically assumes offline data contains complete action, reward and transition information, datasets with only state information (also known asobservation-onlydatasets) are more general, abundant and practical. This motivates our study of thehybrid RL with observation-only offline datasetframework. While the task of competing with the best policy ``covered'' by the offline data can be solved if aresetmodel of the environment is provided (i.e., one that can be reset to any state), we show evidence of hardness of competing when only given the weakertracemodel (i.e., one can only reset to the initial states and must produce full traces through the environment), without further assumption ofadmissibilityof the offline data. Under the admissibility assumptions-- that the offline data could actually be produced by the policy class we consider-- we propose the first algorithm in the trace model setting that provably matches the performance of algorithms that leverage a reset model. We also perform proof-of-concept experiments that suggest the effectiveness of our algorithm in practice.

Citation History

Jan 28, 2026