GUI-Rise: Structured Reasoning and History Summarization for GUI Navigation

1citations

arXiv:2510.27210 Project

citations

#2497

in NEURIPS 2025

of 5858 papers

Top Authors

Data Points

Top Authors

Tao Liu Chongyu Wang Rongjie Li Yingchen Yu Xuming He Song Bai

Abstract

While Multimodal Large Language Models (MLLMs) have advanced GUI navigation agents, current approaches face limitations in cross-domain generalization and effective history utilization. We present a reasoning-enhanced framework that systematically integrates structured reasoning, action prediction, and history summarization. The structured reasoning component generates coherent Chain-of-Thought analyses combining progress estimation and decision reasoning, which inform both immediate action predictions and compact history summaries for future steps. Based on this framework, we train a GUI agent, GUI-Rise, through supervised fine-tuning on pseudo-labeled trajectories and reinforcement learning with Group Relative Policy Optimization (GRPO). This framework employs specialized rewards, including a history-aware objective, directly linking summary quality to subsequent action performance. Comprehensive evaluations on standard benchmarks demonstrate state-of-the-art results under identical training data conditions, with particularly strong performance in out-of-domain scenarios. These findings validate our framework's ability to maintain robust reasoning and generalization across diverse GUI navigation tasks.

Citation History

Jan 25, 2026

Jan 26, 2026

Jan 28, 2026

Feb 13, 2026

1+1