MAS-Bench: Benchmark for Hybrid Mobile GUI Agents

MAS-Bench: A Unified Benchmark for Shortcut-Augmented Hybrid Mobile GUI Agents

Summary: arXiv:2509.06477v2 Announce Type: replace

Abstract: Shortcuts such as APIs and deep-links have emerged as efficient complements to flexible GUI operations, fostering a promising hybrid paradigm for MLLM-based mobile automation. However, systematic evaluation of GUI-shortcut hybrid agents remains largely underexplored. To bridge this gap, we introduce MAS-Bench, a benchmark that pioneers the evaluation of GUI-shortcut hybrid agents with a specific focus on the mobile domain.

Beyond merely using predefined shortcuts, MAS-Bench assesses an agent’s capability to autonomously generate shortcuts by discovering and creating reusable, low-cost workflows. It features 139 complex tasks across 11 real-world applications, a knowledge base of 88 predefined shortcuts (APIs, deep-links, RPA scripts), and 9 evaluation metrics.

Key Features of MAS-Bench

Comprehensive Task Set: MAS-Bench includes 139 complex tasks across a diverse range of 11 real-world applications, ensuring a robust testing environment for hybrid agents.
Predefined Shortcuts: The benchmark comes equipped with a knowledge base of 88 predefined shortcuts, including APIs and deep-links, facilitating efficient hybrid operations.
Evaluation Metrics: MAS-Bench utilizes 9 distinct evaluation metrics to measure the performance and efficiency of hybrid agents, providing a thorough assessment framework.
Autonomous Shortcut Generation: The benchmark emphasizes the capability of agents to autonomously generate shortcuts, promoting innovation in workflow creation.

Experimental Results

Experiments conducted using MAS-Bench reveal that hybrid agents achieve up to a 68.3% success rate and exhibit 39% greater execution efficiency compared to their GUI-only counterparts. This significant improvement highlights the advantages of integrating shortcuts into mobile automation.

Furthermore, the evaluation framework effectively reveals the quality gap between predefined and agent-generated shortcuts. This validation underscores the benchmark’s capability to assess various shortcut generation methods, paving the way for future advancements in intelligent agent technologies.

Conclusion

MAS-Bench addresses the critical gap in the systematic evaluation of GUI-shortcut hybrid mobile agents. By providing a foundational platform for research and development, it encourages the creation of more efficient and robust intelligent agents. The introduction of MAS-Bench marks a significant step forward in enhancing the capabilities of mobile automation through innovative shortcut utilization.

For more information, visit the MAS-Bench project page.

RichlyAI Blog AI Guide, Tutorials, Industrial Insights, & more!

Company

MAS-Bench: Benchmark for Hybrid Mobile GUI Agents

MAS-Bench: A Unified Benchmark for Shortcut-Augmented Hybrid Mobile GUI Agents

Key Features of MAS-Bench

Experimental Results

Conclusion

Related AI Insights

Subscribe

More like thisRelated

About us

Company

The latest

Subscribe

RichlyAI Blog
AI Guide, Tutorials, Industrial Insights, & more!

More like this
Related