From Imperative to Declarative: Towards LLM-friendly OS Interfaces for Boosted Computer-Use Agents
Summary: arXiv:2510.04607v2 Announce Type: replace-cross
Abstract
Computer-use agents (CUAs) powered by large language models (LLMs) have emerged as a promising approach to automating computer tasks, yet they struggle with the existing human-oriented OS interfaces – graphical user interfaces (GUIs). GUIs force LLMs to decompose high-level goals into lengthy, error-prone sequences of fine-grained actions, resulting in low success rates and an excessive number of LLM calls.
Introduction
The advent of large language models has transformed the landscape of artificial intelligence, particularly in the realm of automating tasks that traditionally required human intervention. However, the reliance on graphical user interfaces has posed significant challenges for these models. The imperative nature of GUIs necessitates that LLMs break down complex tasks into a series of detailed actions, which often leads to inefficiencies and errors.
Introducing the Declarative Model Interface (DMI)
In response to these challenges, we propose the Declarative Model Interface (DMI), an innovative abstraction that reconfigures existing GUIs into three key declarative primitives:
- Access: The ability to retrieve information or initiate processes within the system.
- State: The current condition or status of the application or task being managed.
- Observation: The capability to monitor and interpret changes within the system environment.
This framework aims to provide OS interfaces that are more compatible with LLM agents. By focusing on policy-mechanism separation, DMI allows LLMs to concentrate on high-level semantic planning (policy) while the DMI handles low-level navigation and interaction (mechanism). Notably, one of the advantages of DMI is that it does not require any modifications to the application source code or reliance on application programming interfaces (APIs).
Evaluation of DMI
To assess the effectiveness of the Declarative Model Interface, we conducted evaluations with the Microsoft Office Suite, including Word, PowerPoint, and Excel, on the Windows operating system. The results were striking:
- Task success rates improved by 67%.
- Interaction steps were reduced by 43.5%.
- Over 61% of successful tasks were completed with a single LLM call.
Conclusion
The transition from imperative to declarative OS interfaces represents a significant advancement in the development of computer-use agents. By leveraging the strengths of large language models through the Declarative Model Interface, we can enhance the efficiency and effectiveness of automated computer tasks. The implications of this research extend beyond the immediate applications within Microsoft Office, potentially transforming how we interact with technology across various domains.
As the field of AI continues to evolve, embracing architectures that support high-level semantic understanding will be crucial for the next generation of intelligent agents.
