MAS-Bench: A Unified Benchmark for Shortcut-Augmented Hybrid Mobile GUI Agents
Summary: arXiv:2509.06477v2 Announce Type: replace
Abstract: Shortcuts such as APIs and deep-links have emerged as efficient complements to flexible GUI operations, fostering a promising hybrid paradigm for MLLM-based mobile automation. However, systematic evaluation of GUI-shortcut hybrid agents remains largely underexplored. To bridge this gap, we introduce MAS-Bench, a benchmark that pioneers the evaluation of GUI-shortcut hybrid agents with a specific focus on the mobile domain.
Beyond merely using predefined shortcuts, MAS-Bench assesses an agent’s capability to autonomously generate shortcuts by discovering and creating reusable, low-cost workflows. It features 139 complex tasks across 11 real-world applications, a knowledge base of 88 predefined shortcuts (APIs, deep-links, RPA scripts), and 9 evaluation metrics.
Key Features of MAS-Bench
- Comprehensive Task Set: MAS-Bench includes 139 complex tasks across a diverse range of 11 real-world applications, ensuring a robust testing environment for hybrid agents.
- Predefined Shortcuts: The benchmark comes equipped with a knowledge base of 88 predefined shortcuts, including APIs and deep-links, facilitating efficient hybrid operations.
- Evaluation Metrics: MAS-Bench utilizes 9 distinct evaluation metrics to measure the performance and efficiency of hybrid agents, providing a thorough assessment framework.
- Autonomous Shortcut Generation: The benchmark emphasizes the capability of agents to autonomously generate shortcuts, promoting innovation in workflow creation.
Experimental Results
Experiments conducted using MAS-Bench reveal that hybrid agents achieve up to a 68.3% success rate and exhibit 39% greater execution efficiency compared to their GUI-only counterparts. This significant improvement highlights the advantages of integrating shortcuts into mobile automation.
Furthermore, the evaluation framework effectively reveals the quality gap between predefined and agent-generated shortcuts. This validation underscores the benchmark’s capability to assess various shortcut generation methods, paving the way for future advancements in intelligent agent technologies.
Conclusion
MAS-Bench addresses the critical gap in the systematic evaluation of GUI-shortcut hybrid mobile agents. By providing a foundational platform for research and development, it encourages the creation of more efficient and robust intelligent agents. The introduction of MAS-Bench marks a significant step forward in enhancing the capabilities of mobile automation through innovative shortcut utilization.
For more information, visit the MAS-Bench project page.
