ProRe: A Proactive Reward System for GUI Agents via Reasoner-Actor Collaboration
Summary: arXiv:2509.21823v2 Announce Type: replace
Abstract: Reward is critical to the evaluation and training of large language models (LLMs). However, existing rule-based or model-based reward methods struggle to generalize to GUI agents, where access to ground-truth trajectories or application databases is often unavailable, and static trajectory-based LLM-as-a-Judge approaches suffer from limited accuracy. To address these challenges, we propose ProRe, a proactive reward system that leverages a general-purpose reasoner and domain-specific evaluator agents (actors). The reasoner schedules targeted state probing tasks, which the evaluator agents then execute by actively interacting with the environment to collect additional observations. This enables the reasoner to assign more accurate and verifiable rewards to GUI agents.
Introduction
In the rapidly evolving field of artificial intelligence, the evaluation and training of large language models (LLMs) have become increasingly complex. Traditional reward systems, whether rule-based or model-based, have faced significant limitations when applied to graphical user interface (GUI) agents. The lack of access to ground-truth trajectories or comprehensive application databases poses a formidable challenge, resulting in inaccuracies in performance assessment.
The ProRe System
The ProRe system represents a significant advancement in reward evaluation methodologies. By employing a collaborative framework between a general-purpose reasoner and specialized evaluator agents, ProRe enhances the accuracy of reward assignments for GUI agents. This system is designed to overcome the limitations of existing models by implementing the following key components:
- Reasoner: The core component that schedules targeted state probing tasks, guiding the evaluators in their interactions with the environment.
- Evaluator Agents: Domain-specific agents that execute the tasks assigned by the reasoner, actively engaging with the GUI to gather necessary observations.
- Proactive Interactions: Through active environmental engagement, evaluator agents can collect real-time data, significantly improving the reliability of the reward assessments.
Empirical Results
The effectiveness of the ProRe system has been validated through extensive empirical testing on over 3,000 trajectories. The results indicate notable improvements in both reward accuracy and F1 score:
- Reward accuracy increased by up to 5.3%.
- F1 score improved by up to 19.4%.
Moreover, when ProRe was integrated with state-of-the-art policy agents, an impressive success rate improvement of up to 22.4% was achieved, highlighting the potential of this proactive reward system to enhance the performance of GUI agents significantly.
Conclusion
The introduction of ProRe marks a promising development in the field of AI, particularly in the training and evaluation of GUI agents. By leveraging the collaborative capabilities of reasoner-actor systems, ProRe not only addresses the existing challenges but also sets a new standard for reward systems in artificial intelligence. The source code for ProRe is publicly available at https://github.com/V-Droid-Agent/ProRe, inviting further research and development in this innovative area.
