GUIDE: A Benchmark for Understanding and Assisting Users in Open-Ended GUI Tasks
Source: arXiv:2603.25864v1 | Type: Cross
Abstract
Graphical User Interface (GUI) agents have the potential to assist users in interacting with complex software (e.g., PowerPoint, Photoshop). While prior research has primarily focused on automating user actions through clicks and keystrokes, this paradigm overlooks human intention, where users value the ability to explore, iterate, and refine their ideas while maintaining agency. To move beyond automation and toward collaboration, GUI agents must understand what users are doing and why.
Introducing GUIDE
We introduce GUIDE (GUI User Intent Detection Evaluation), a benchmark that evaluates AI models on their ability to perceive user behavior, infer intent, and provide assistance in open-ended GUI tasks. GUIDE consists of 67.5 hours of screen recordings from 120 novice user demonstrations with think-aloud narrations, across 10 software applications.
Key Tasks Defined by GUIDE
The GUIDE benchmark defines three critical tasks:
- Behavior State Detection: Identifying the current state of user actions within the GUI.
- Intent Prediction: Reasoning about the user’s goals and intentions based on their behavior.
- Help Prediction: Deciding when and how to assist the user effectively.
Evaluation of AI Models
Evaluations across eight state-of-the-art multimodal models reveal that all models struggled to meet the benchmark’s expectations. The results showed that:
- Behavior state detection accuracy was only 44.6%.
- Help prediction accuracy was marginally better at 55.0%.
Importance of User Context
Interestingly, the inclusion of structured user context significantly improved model performance. Providing relevant user context raised the help prediction accuracy by up to 50.2 percentage points. This finding highlights the critical role of understanding user intentions in delivering effective assistance.
Conclusion
The GUIDE benchmark serves as a vital tool for advancing research in the field of AI-driven GUI assistance. By focusing on user intent and behavior, it paves the way for the development of more collaborative and intuitive GUI agents. Researchers and developers can access the dataset and further information at https://guide-bench.github.io.
