Faithful Mobile GUI Agents with Guided Advantage Estimator

In the rapidly evolving domain of artificial intelligence, the development of vision-language model-based graphical user interface (GUI) agents has emerged as a pivotal area of research. These agents have exhibited remarkable capabilities in interacting with users and executing tasks. However, a significant challenge remains: the tendency of these agents to exhibit unfaithful behavior, primarily relying on memorized shortcuts rather than grounding their actions in the actual evidence presented on the screen or following the user’s explicit instructions. To tackle this pressing issue, researchers have introduced a novel framework known as Faithful-Agent.

Overview of Faithful-Agent

Faithful-Agent is designed to prioritize evidence-grounded interactions and internal consistency in GUI environments. The framework employs a two-stage pipeline that enhances the overall reliability and performance of GUI agents:

Stage I: Faithfulness-Oriented SFT (Supervised Fine-Tuning)

This initial stage focuses on instilling abstainment behaviors in agents when faced with evidence perturbations. By adapting the agents’ responses to the changing dynamics of the displayed information, this stage ensures that actions remain grounded in the immediate context.

Stage II: Reinforcement Fine-Tuning (RFT) with Guided Advantage Estimator (GuAE)

The second stage amplifies the agents’ faithfulness through the introduction of the Guided Advantage Estimator (GuAE). This innovative mechanism serves as an anchor-based and variance-adaptive advantage tempering system, developed upon the Generalized Relative Policy Optimization (GRPO) algorithm. GuAE is particularly effective in preventing advantage collapse, which can occur in low-variance rollout groups under sparse GUI rewards.

Key Innovations and Results

One of the standout features of the Faithful-Agent framework is its ability to incorporate a thought-action consistency reward. This approach not only reinforces the faithfulness of the agents but also encourages them to align their actions closely with the intentions behind user commands. As a result, the performance of Faithful-Agent has seen a remarkable improvement in specific task scenarios.

For instance, the Trap Success Rate (SR) has been elevated from a mere 13.88% to an impressive 80.21% when compared to baseline models. This substantial increase highlights the potential of the Faithful-Agent framework in enhancing the reliability and effectiveness of GUI agents in real-world applications.

Implications for Future Research

The introduction of Faithful-Agent represents a significant advancement in the field of AI-driven GUI interactions. By prioritizing faithfulness and evidence-based actions, this framework addresses a critical gap in current methodologies. Future research can build upon these findings to explore additional enhancements in GUI agent behavior, with the aim of creating even more responsive and reliable AI systems.

As the landscape of AI continues to evolve, the insights gained from the Faithful-Agent framework could pave the way for more trustworthy and effective interactions between users and AI systems, ultimately leading to improved user experiences across various applications.

RichlyAI Blog AI Guide, Tutorials, Industrial Insights, & more!

Company

Faithful Mobile GUI Agents with Guided Advantage Estimator