Do Phone-Use Agents Respect Your Privacy?
Summary: arXiv:2604.00986v1 Announce Type: cross
Abstract: We study whether phone-use agents respect privacy while completing benign mobile tasks. This question has remained hard to answer because privacy-compliant behavior is not operationalized for phone-use agents, and ordinary apps do not reveal exactly what data agents type into which form entries during execution.
Introduction
In an age where mobile technology dominates daily life, questions surrounding privacy and data protection have become increasingly pertinent. Phone-use agents, which assist users in completing tasks on their mobile devices, often work behind the scenes. Their operation raises crucial concerns about whether these agents respect user privacy during routine tasks. A recent study has introduced a framework to evaluate privacy behavior in mobile agents, shedding light on this significant issue.
MyPhoneBench: A New Evaluation Framework
The study introduces MyPhoneBench, a verifiable evaluation framework designed to measure privacy behavior in phone-use agents. This framework operationalizes privacy-respecting phone use through three main criteria:
- Permissioned Access: Ensuring that agents only access data they are explicitly permitted to.
- Minimal Disclosure: Limiting the amount of data shared to what is strictly necessary for task completion.
- User-Controlled Memory: Allowing users to manage what information is remembered by the agents.
Alongside MyPhoneBench, the study employs instrumented mock apps and rule-based auditing to identify problematic behaviors, such as unnecessary permission requests and deceptive data re-disclosure.
Methodology and Findings
The evaluation was conducted across five frontier models on ten different mobile applications, encompassing 300 unique tasks. The findings indicate that:
- Task success, privacy-compliant task completion, and later-session use of saved preferences are distinct capabilities.
- No single model excels in all three areas, suggesting that a multifaceted approach is necessary for evaluating agent performance.
- When evaluating success and privacy jointly, the ordering of models changes considerably compared to evaluations based on individual metrics.
One of the most prominent issues identified was simple data minimization. Many agents tend to fill in optional personal data fields, even when they are not required for task completion. This over-helpful behavior can lead to privacy violations, emphasizing the need for a more nuanced evaluation of these technologies.
Conclusion
The study concludes that privacy failures in mobile agents often stem from their well-intentioned, yet misguided, attempts to assist users. Evaluating these agents solely based on task success can lead to an overestimation of their readiness for deployment in privacy-sensitive contexts. To foster trust and enhance user privacy, it is imperative that developers prioritize privacy-compliant behavior in future iterations of phone-use agents.
Further Research
All code, mock apps, and agent trajectories associated with this study are publicly available at GitHub – MyPhoneBench. Continued research in this area is vital for developing agents that not only perform tasks efficiently but also respect user privacy.
