MAS-Bench: Benchmark for Hybrid Mobile GUI Agents

Date:


MAS-Bench: A Unified Benchmark for Shortcut-Augmented Hybrid Mobile GUI Agents

Summary: arXiv:2509.06477v2 Announce Type: replace

Abstract: Shortcuts such as APIs and deep-links have emerged as efficient complements to flexible GUI operations, fostering a promising hybrid paradigm for MLLM-based mobile automation. However, systematic evaluation of GUI-shortcut hybrid agents remains largely underexplored. To bridge this gap, we introduce MAS-Bench, a benchmark that pioneers the evaluation of GUI-shortcut hybrid agents with a specific focus on the mobile domain.

Beyond merely using predefined shortcuts, MAS-Bench assesses an agent’s capability to autonomously generate shortcuts by discovering and creating reusable, low-cost workflows. It features 139 complex tasks across 11 real-world applications, a knowledge base of 88 predefined shortcuts (APIs, deep-links, RPA scripts), and 9 evaluation metrics.

Key Features of MAS-Bench

  • Comprehensive Task Set: MAS-Bench includes 139 complex tasks across a diverse range of 11 real-world applications, ensuring a robust testing environment for hybrid agents.
  • Predefined Shortcuts: The benchmark comes equipped with a knowledge base of 88 predefined shortcuts, including APIs and deep-links, facilitating efficient hybrid operations.
  • Evaluation Metrics: MAS-Bench utilizes 9 distinct evaluation metrics to measure the performance and efficiency of hybrid agents, providing a thorough assessment framework.
  • Autonomous Shortcut Generation: The benchmark emphasizes the capability of agents to autonomously generate shortcuts, promoting innovation in workflow creation.

Experimental Results

Experiments conducted using MAS-Bench reveal that hybrid agents achieve up to a 68.3% success rate and exhibit 39% greater execution efficiency compared to their GUI-only counterparts. This significant improvement highlights the advantages of integrating shortcuts into mobile automation.

Furthermore, the evaluation framework effectively reveals the quality gap between predefined and agent-generated shortcuts. This validation underscores the benchmark’s capability to assess various shortcut generation methods, paving the way for future advancements in intelligent agent technologies.

Conclusion

MAS-Bench addresses the critical gap in the systematic evaluation of GUI-shortcut hybrid mobile agents. By providing a foundational platform for research and development, it encourages the creation of more efficient and robust intelligent agents. The introduction of MAS-Bench marks a significant step forward in enhancing the capabilities of mobile automation through innovative shortcut utilization.

For more information, visit the MAS-Bench project page.


Related AI Insights

Lazarus Omolua
Lazarus Omoluahttps://richlyai.com/blog
My mission is to make sure that people in Africa are not left behind in the global AI revolution. RichlyAI exists to give everyone — students, founders, creators, and businesses — the tools to compete globally.

Subscribe

Popular

More like this
Related

How Business Ops Teams Boost Productivity with Codex

Discover how business operations teams use Codex to streamline documentation, enhance collaboration, and improve decision-making with AI-powered automation...

OpenAI Partners with Malta to Offer ChatGPT Plus Nationwide

OpenAI and Malta team up to provide free ChatGPT Plus access and AI training to all citizens, promoting digital literacy and responsible AI use.

Critical Linux Kernel Flaw Risks SSH Host Key Theft

A critical Linux kernel flaw risks stolen SSH host keys. Learn how to protect your systems and stay secure until patches are widely available.

Top External Hard Drives 2026: Expert Reviews & Buying Guide

Discover the best external hard drives of 2026 with expert reviews. Find top picks for speed, durability, and security to suit all storage needs.