Improving Model-Based Reinforcement Learning by Converging to Flatter Minima

0citations
0
Citations
#1981
in NeurIPS 2025
of 5858 papers
4
Authors
4
Data Points

Abstract

Model-based reinforcement learning (MBRL) hinges on a learned dynamics model whose errors can compound along imagined rollouts. We study how encouraging flatness in the model’s training loss affects downstream control, and show that steering optimization toward flatter minima yields a better policy. Concretely, we integrate Sharpness-Aware Minimization (SAM) into world-model training as a drop-in objective, leaving the planner and policy components unchanged. On the theory side, we derive PAC-Bayesian bounds that link first-order sharpness to the value-estimation gap and the performance gap between model-optimal and true-optimal policies, implying that flatter minima tighten both. Empirically, SAM reduces measured sharpness and value-prediction error and improves returns across HumanoidBench, Atari-100k, and high-DoF DeepMind Control tasks. Augmenting existing MBRL algorithms with SAM increases mean return, with especially large gains in settings with high dimensional state–action space. We further observe positive transfer across algorithms and input modalities, including a transformer-based world-model. These results position flat-minima training as a simple, general mechanism for more robust MBRL without architectural changes.

Citation History

Jan 26, 2026
0
Jan 27, 2026
0
Jan 27, 2026
0
Feb 1, 2026
0