Progress Reward Model for Reinforcement Learning via Large Language Models

0citations

Citations

#1690

in NeurIPS 2025

of 5858 papers

Authors

Data Points

Authors

Xiuhui Zhang Ning Gao Xingyu Jiang Yihui Chen Yuheng Pan Mohan Zhang Yue Deng

Topics

reinforcement learning large language models reward shaping task decomposition progress function robotics control policy optimization

Abstract

Traditional reinforcement learning (RL) algorithms face significant limitations in handling long-term tasks with sparse rewards. Recent advancements have leveraged large language models (LLMs) to enhance RL by utilizing their world knowledge for task planning and reward generation. However, planning-based approaches often depend on pre-defined skill libraries and fail to optimize low-level control policies, while reward-based methods require extensive human feedback or exhaustive searching due to the complexity of tasks. In this paper, we propose the Progress Reward Model for RL (PRM4RL), a novel framework that integrates task planning and dense reward to enhance RL. For high-level planning, a complex task is decomposed into a series of simple manageable subtasks, with a subtask-oriented, fine-grained progress function designed to monitor task execution progress. For low-level reward generation, inspired by potential-based reward shaping, we use the progress function to construct a Progress Reward Model (PRM), providing theoretically grounded optimality and convergence guarantees, thereby enabling effective policy optimization. Experimental results on robotics control tasks demonstrate that our approach outperforms both LLM-based planning and reward methods, achieving state-of-the-art performance.

Citation History

Jan 25, 2026

Jan 27, 2026

Jan 31, 2026