UI-Zoomer: Adaptive Uncertainty-Driven Zoom for GUI Grounding

UI-Zoomer: Uncertainty-Driven Adaptive Zoom-In for GUI Grounding

Summary: arXiv:2604.14113v1 Announce Type: cross

Abstract

GUI grounding, which localizes interface elements from screenshots given natural language queries, remains challenging for small icons and dense layouts. Test-time zoom-in methods improve localization by cropping and re-running inference at higher resolution, but apply cropping uniformly across all instances with fixed crop sizes, ignoring whether the model is actually uncertain on each case. We propose UI-Zoomer, a training-free adaptive zoom-in framework that treats both the trigger and scale of zoom-in as a prediction uncertainty quantification problem.

Key Features of UI-Zoomer

Confidence-Aware Gate: A novel mechanism that fuses spatial consensus among stochastic candidates with token-level generation confidence to selectively trigger zoom-in only when localization is uncertain.
Uncertainty-Driven Crop Sizing: This module decomposes prediction variance into inter-sample positional spread and intra-sample box extent, deriving a per-instance crop radius via the law of total variance.
Training-Free Framework: UI-Zoomer does not require additional training, making it more efficient for real-world applications where time and resources are limited.

Performance Evaluation

Extensive experiments conducted on benchmark datasets such as ScreenSpot-Pro, UI-Vision, and ScreenSpot-v2 demonstrate consistent improvements over strong baselines across multiple model architectures. The results indicate that UI-Zoomer achieves gains of up to:

+13.4% on ScreenSpot-Pro
+10.3% on UI-Vision
+4.2% on ScreenSpot-v2

These improvements highlight the effectiveness of the uncertainty-driven approach in refining localization tasks in GUI grounding, particularly in scenarios where conventional methods fall short.

Conclusion

The UI-Zoomer framework represents a significant advancement in the field of GUI grounding. By adapting the zoom-in process based on uncertainty quantification, it not only enhances the accuracy of localization tasks but also streamlines the process by eliminating the need for additional training. This positions UI-Zoomer as a promising solution for developers and researchers aiming to improve the robustness of GUI analysis systems.

Future Directions

Going forward, the potential applications of UI-Zoomer can extend beyond GUI grounding to other areas such as image segmentation and object detection, where uncertainty plays a critical role. Continued research and development in this area could further enhance the capabilities of AI systems in understanding and interacting with complex visual information.

RichlyAI Blog AI Guide, Tutorials, Industrial Insights, & more!

Company

UI-Zoomer: Adaptive Uncertainty-Driven Zoom for GUI Grounding

UI-Zoomer: Uncertainty-Driven Adaptive Zoom-In for GUI Grounding

Abstract

Key Features of UI-Zoomer

Performance Evaluation

Conclusion

Future Directions

Related AI Insights

Subscribe

More like thisRelated

About us

Company

The latest

Subscribe

RichlyAI Blog
AI Guide, Tutorials, Industrial Insights, & more!

More like this
Related