VISOR: Advanced Visual Retrieval for Long-Horizon Reasoning

Date:

VISOR: Agentic Visual Retrieval-Augmented Generation via Iterative Search and Over-horizon Reasoning

Summary: arXiv:2604.09508v1 Announce Type: cross

Abstract: Visual Retrieval-Augmented Generation (VRAG) empowers Vision-Language Models to retrieve and reason over visually rich documents. To tackle complex queries requiring multi-step reasoning, agentic VRAG systems interleave reasoning with iterative retrieval. However, existing agentic VRAG faces two critical bottlenecks.

Challenges in Current Agentic VRAG Systems

  • Visual Evidence Sparsity: Key evidence is scattered across pages yet processed in isolation, hindering cross-page reasoning. Moreover, fine-grained intra-image evidence often requires precise visual actions, whose misuse degrades retrieval quality.
  • Search Drift in Long Horizons: The accumulation of visual tokens across retrieved pages dilutes context and causes cognitive overload, leading agents to deviate from their search objective.

Introducing VISOR

To address these challenges, we propose VISOR (Visual Retrieval-Augmented Generation via Iterative Search and Over-horizon Reasoning), a unified single-agent framework designed to enhance the capabilities of visual reasoning systems. VISOR integrates several innovative features aimed at improving the retrieval and reasoning process.

Key Features of VISOR

  • Structured Evidence Space: This feature allows for progressive cross-page reasoning, enabling the model to effectively gather and synthesize information from multiple pages.
  • Visual Action Evaluation and Correction: A mechanism to manage visual actions, ensuring that the system can make precise visual interactions that enhance retrieval quality.
  • Dynamic Trajectory with Sliding Window: This approach mitigates search drift by anchoring the evidence space and discarding earlier raw interactions. It prevents the context from being overwhelmed by an excess of visual tokens.
  • Intent Injection: This component helps maintain focus on the search objectives, further reducing the likelihood of search drift.

Training VISOR

VISOR is trained using a Group Relative Policy Optimization-based Reinforcement Learning (GRPO-based RL) pipeline. This training method incorporates state masking and credit assignment tailored for dynamic context reconstruction, enhancing the model’s ability to adapt to varying conditions.

Performance and Results

Extensive experiments conducted on benchmark datasets, including ViDoSeek, SlideVQA, and MMLongBench, demonstrate that VISOR achieves state-of-the-art performance in long-horizon visual reasoning tasks. The results indicate not only superior efficiency but also a significant improvement in the ability to handle complex queries through enhanced reasoning capabilities.

In conclusion, VISOR represents a significant advancement in the field of Visual Retrieval-Augmented Generation. By addressing key challenges in current systems, it paves the way for more efficient and accurate visual reasoning, ultimately leading to improved outcomes in various applications.


Related AI Insights

Lazarus Omolua
Lazarus Omoluahttps://richlyai.com/blog
My mission is to make sure that people in Africa are not left behind in the global AI revolution. RichlyAI exists to give everyone — students, founders, creators, and businesses — the tools to compete globally.

Subscribe

Popular

More like this
Related

How Business Ops Teams Boost Productivity with Codex

Discover how business operations teams use Codex to streamline documentation, enhance collaboration, and improve decision-making with AI-powered automation...

OpenAI Partners with Malta to Offer ChatGPT Plus Nationwide

OpenAI and Malta team up to provide free ChatGPT Plus access and AI training to all citizens, promoting digital literacy and responsible AI use.

Critical Linux Kernel Flaw Risks SSH Host Key Theft

A critical Linux kernel flaw risks stolen SSH host keys. Learn how to protect your systems and stay secure until patches are widely available.

Top External Hard Drives 2026: Expert Reviews & Buying Guide

Discover the best external hard drives of 2026 with expert reviews. Find top picks for speed, durability, and security to suit all storage needs.