CORA: Conformal Risk-Controlled Agents for Safeguarded Mobile GUI Automation
Summary: arXiv:2604.09155v1 Announce Type: cross
The advancement of graphical user interface (GUI) agents, particularly those powered by vision language models (VLMs), has taken a significant leap forward. These agents are evolving from merely providing passive assistance to performing autonomous operations. However, this shift introduces a plethora of risks, particularly in the realms of financial loss, privacy breaches, and social harm, as the unrestricted action space could lead to potentially severe consequences for users.
Current safeguards employed in the industry primarily depend on prompt engineering and brittle heuristics, which often lack formal verification and user-tunable guarantees. To address these shortcomings, we introduce CORA (COnformal Risk-controlled GUI Agent), a novel framework designed to enhance safety in the execution of automated GUI actions.
Framework Overview
CORA is a post-policy, pre-action safeguarding framework that aims to provide statistical guarantees concerning harmful actions executed by the agent. The core idea behind CORA is to reformulate the concept of safety as selective action execution. This is achieved by training a Guardian model that estimates the action-conditional risk associated with each proposed step.
- Guardian Model: This model assesses the risk of each action before it is executed, allowing for a more informed decision-making process.
- Conformal Risk Control: Instead of merely applying a threshold to raw risk scores, CORA employs Conformal Risk Control. This method calibrates an execute/abstain boundary that aligns with a user-specified risk budget.
- Diagnostician Model: Actions that are rejected based on risk assessments are routed to a trainable Diagnostician model. This model performs multimodal reasoning over these rejected actions and recommends interventions such as confirming, reflecting, or aborting actions to alleviate user burden.
Goal-Lock Mechanism and Benchmarking
To further bolster the security of the system, CORA incorporates a Goal-Lock mechanism. This mechanism anchors the assessment of actions to a clarified and frozen user intent, thereby providing resistance against visual injection attacks, which are a prevalent threat in GUI automation.
To rigorously evaluate the CORA framework, we introduce Phone-Harm, a new benchmark designed to assess mobile safety violations with step-level harm labels. This benchmark operates under real-world settings, ensuring that the evaluations are relevant and accurate.
Experimental Validation
Preliminary experiments conducted using the Phone-Harm benchmark, along with other public benchmarks, have demonstrated that CORA significantly enhances the balance between safety, helpfulness, and interruption. It effectively improves the Pareto frontier in these dimensions, offering a practical and statistically grounded safety paradigm for the autonomous execution of GUI tasks.
For those interested in exploring this innovative framework further, code and the benchmark are publicly available at cora-agent.github.io.
