ReMA: Learning to Meta-Think for LLMs with Multi-agent Reinforcement Learning

36citations

arXiv:2503.09501

Citations

#70

in NeurIPS 2025

of 5858 papers

Authors

Data Points

Authors

Ziyu Wan Yunxiang Li Xiaoyu Wen Yan Song Hanjing Wang Linyi Yang Mark Schmidt Jun Wang Weinan Zhang Shuyue Hu Ying Wen

Topics

meta-thinking multi-agent reinforcement learning large language models reasoning processes hierarchical agents mathematical reasoning llm-as-a-judge multi-turn interaction

Abstract

Recent research on Reasoning of Large Language Models (LLMs) has sought to further enhance their performance by integrating meta-thinking -- enabling models to monitor, evaluate, and control their reasoning processes for more adaptive and effective problem-solving. However, current single-agent work lacks a specialized design for acquiring meta-thinking, resulting in low efficacy. To address this challenge, we introduce Reinforced Meta-thinking Agents (ReMA), a novel framework that leverages Multi-Agent Reinforcement Learning (MARL) to elicit meta-thinking behaviors, encouraging LLMs to think about thinking. ReMA decouples the reasoning process into two hierarchical agents: a high-level meta-thinking agent responsible for generating strategic oversight and plans, and a low-level reasoning agent for detailed executions. Through iterative reinforcement learning with aligned objectives, these agents explore and learn collaboration, leading to improved generalization and robustness. Empirical results from single-turn experiments demonstrate that ReMA outperforms single-agent RL baselines on complex reasoning tasks, including competitive-level mathematical benchmarks and LLM-as-a-Judge benchmarks. Additionally, we further extend ReMA to multi-turn interaction settings, leveraging turn-level ratio and parameter sharing to improve efficiency. Comprehensive ablation studies further illustrate the evolving dynamics of each distinct agent, providing valuable insights into how the meta-thinking reasoning process enhances the reasoning capabilities of LLMs. Our code can be found in https://github.com/ziyuwan/ReMA-public

Citation History

Jan 25, 2026

Jan 31, 2026

36+1