Scalable Lightweight GUI Agents with Multi-role Orchestration

Date:

Towards Scalable Lightweight GUI Agents via Multi-role Orchestration

Summary: arXiv:2604.13488v1 Announce Type: new

Abstract: Autonomous Graphical User Interface (GUI) agents powered by Multimodal Large Language Models (MLLMs) enable digital automation on end-user devices. While scaling both parameters and data has yielded substantial gains, advanced methods still suffer from prohibitive deployment costs on resource-constrained devices. When facing complex in-the-wild scenarios, lightweight GUI agents are bottlenecked by limited capacity and poor task scalability under end-to-end episodic learning, impeding adaptation to multi-agent systems (MAS), while training multiple skill-specific experts remains costly. Can we strike an effective trade-off in this cost-scalability dilemma, enabling lightweight MLLMs to participate in realistic GUI workflows?

To address these challenges, we propose the LAMO framework, which endows a lightweight MLLM with GUI-specific knowledge and task scalability, allowing multi-role orchestration to expand its capability boundary for GUI automation.

Key Features of the LAMO Framework

The LAMO framework combines role-oriented data synthesis with a two-stage training recipe:

  • Supervised Fine-tuning: This involves Perplexity-Weighted Cross-Entropy optimization for knowledge distillation and visual perception enhancement.
  • Reinforcement Learning: This stage focuses on role-oriented cooperative exploration to enhance the agent’s adaptability and performance.

Development of LAMO-3B

With LAMO, we have developed a task-scalable native GUI agent known as LAMO-3B. This agent supports both monolithic execution and MAS-style orchestration, allowing for a flexible approach to GUI automation.

When paired with advanced planners as a plug-and-play policy executor, LAMO-3B can continuously benefit from advancements in planning technologies. This dynamic capability enables a higher performance ceiling, significantly enhancing the agent’s operational efficiency and effectiveness in real-world applications.

Evaluation and Results

Extensive static and online evaluations have validated the effectiveness of our design. The evaluations demonstrate that LAMO-3B is not only capable of performing tasks efficiently but also exhibits adaptability in various scenarios. This adaptability is critical for fulfilling the demands of complex GUI workflows, where traditional agents often fall short due to their rigid design.

Conclusion

The introduction of the LAMO framework represents a significant advancement in the field of lightweight GUI agents. By addressing the cost-scalability dilemma and enhancing task scalability, LAMO-3B paves the way for more efficient and effective digital automation solutions across various end-user devices. As the demand for intelligent automation continues to grow, the insights and methodologies presented in this research will play a pivotal role in shaping the future of GUI agents.


Related AI Insights

Lazarus Omolua
Lazarus Omoluahttps://richlyai.com/blog
My mission is to make sure that people in Africa are not left behind in the global AI revolution. RichlyAI exists to give everyone — students, founders, creators, and businesses — the tools to compete globally.

Subscribe

Popular

More like this
Related

How Business Ops Teams Boost Productivity with Codex

Discover how business operations teams use Codex to streamline documentation, enhance collaboration, and improve decision-making with AI-powered automation...

OpenAI Partners with Malta to Offer ChatGPT Plus Nationwide

OpenAI and Malta team up to provide free ChatGPT Plus access and AI training to all citizens, promoting digital literacy and responsible AI use.

Critical Linux Kernel Flaw Risks SSH Host Key Theft

A critical Linux kernel flaw risks stolen SSH host keys. Learn how to protect your systems and stay secure until patches are widely available.

Top External Hard Drives 2026: Expert Reviews & Buying Guide

Discover the best external hard drives of 2026 with expert reviews. Find top picks for speed, durability, and security to suit all storage needs.