GUIDE Benchmark: AI for User Intent in GUI Tasks

GUIDE: A Benchmark for Understanding and Assisting Users in Open-Ended GUI Tasks

Source: arXiv:2603.25864v1 | Type: Cross

Abstract

Graphical User Interface (GUI) agents have the potential to assist users in interacting with complex software (e.g., PowerPoint, Photoshop). While prior research has primarily focused on automating user actions through clicks and keystrokes, this paradigm overlooks human intention, where users value the ability to explore, iterate, and refine their ideas while maintaining agency. To move beyond automation and toward collaboration, GUI agents must understand what users are doing and why.

Introducing GUIDE

We introduce GUIDE (GUI User Intent Detection Evaluation), a benchmark that evaluates AI models on their ability to perceive user behavior, infer intent, and provide assistance in open-ended GUI tasks. GUIDE consists of 67.5 hours of screen recordings from 120 novice user demonstrations with think-aloud narrations, across 10 software applications.

Key Tasks Defined by GUIDE

The GUIDE benchmark defines three critical tasks:

Behavior State Detection: Identifying the current state of user actions within the GUI.
Intent Prediction: Reasoning about the user’s goals and intentions based on their behavior.
Help Prediction: Deciding when and how to assist the user effectively.

Evaluation of AI Models

Evaluations across eight state-of-the-art multimodal models reveal that all models struggled to meet the benchmark’s expectations. The results showed that:

Behavior state detection accuracy was only 44.6%.
Help prediction accuracy was marginally better at 55.0%.

Importance of User Context

Interestingly, the inclusion of structured user context significantly improved model performance. Providing relevant user context raised the help prediction accuracy by up to 50.2 percentage points. This finding highlights the critical role of understanding user intentions in delivering effective assistance.

Conclusion

The GUIDE benchmark serves as a vital tool for advancing research in the field of AI-driven GUI assistance. By focusing on user intent and behavior, it paves the way for the development of more collaborative and intuitive GUI agents. Researchers and developers can access the dataset and further information at https://guide-bench.github.io.

RichlyAI Blog AI Guide, Tutorials, Industrial Insights, & more!

Company

GUIDE Benchmark: AI for User Intent in GUI Tasks

GUIDE: A Benchmark for Understanding and Assisting Users in Open-Ended GUI Tasks

Abstract

Introducing GUIDE

Key Tasks Defined by GUIDE

Evaluation of AI Models

Importance of User Context

Conclusion

Related AI Insights

Subscribe

More like thisRelated

About us

Company

The latest

Subscribe

RichlyAI Blog
AI Guide, Tutorials, Industrial Insights, & more!

More like this
Related