LITTA: Advanced Multimodal Retrieval with Query Expansion

Date:

LITTA: Late-Interaction and Test-Time Alignment for Visually-Grounded Multimodal Retrieval

The challenge of retrieving relevant evidence from visually rich documents, such as textbooks, technical reports, and manuals, has been a significant hurdle in the fields of information retrieval and natural language processing. Traditional methods often struggle due to long context lengths, complex layouts, and weak lexical overlap between user queries and the content of supporting pages. In response to these challenges, researchers have introduced LITTA, a novel retrieval framework designed to enhance evidence page retrieval without necessitating retraining of the underlying retrieval model.

LITTA, which stands for Late-Interaction and Test-Time Alignment, focuses on a query-expansion-centric approach to improve multimodal document retrieval. The framework leverages a large language model to generate complementary query variants based on the initial user query. This innovative method not only expands the search parameters but also enhances the range of candidate pages that can be retrieved.

The retrieval process within LITTA involves the following key steps:

  • Query Expansion: The framework uses a large language model to create multiple variants of the user’s original query, thereby broadening the search scope.
  • Candidate Page Retrieval: Each query variant is processed using a frozen vision retriever, which scores the candidate pages through late-interaction scoring.
  • Aggregation of Results: The candidates retrieved from the expanded queries are then aggregated using reciprocal rank fusion. This step enhances evidence coverage and minimizes the sensitivity to any single phrasing of the query.

The implementation of this test-time strategy significantly boosts the robustness of the retrieval process while maintaining compatibility with existing multimodal embedding indices. The effectiveness of LITTA has been evaluated across three distinct domains: computer science, pharmaceuticals, and industrial manuals. The results indicate that multi-query retrieval consistently improves key performance metrics, including top-k accuracy, recall, and Mean Reciprocal Rank (MRR), particularly in domains characterized by high visual and semantic variability.

One of the standout features of LITTA is its ability to provide a controllable accuracy-efficiency trade-off. Users can adjust the number of query variants generated, allowing for practical deployment under latency constraints. This flexibility makes LITTA not only effective but also suitable for real-world applications where time sensitivity is a concern.

In conclusion, LITTA represents a significant advancement in the field of visually grounded multimodal retrieval. By employing a query expansion strategy coupled with late-interaction scoring and result aggregation, LITTA enhances the retrieval process’s robustness and effectiveness. The framework’s successful evaluation across varied domains underscores its potential to revolutionize how we approach evidence retrieval in visually rich documents.


Related AI Insights

Lazarus Omolua
Lazarus Omoluahttps://richlyai.com/blog
My mission is to make sure that people in Africa are not left behind in the global AI revolution. RichlyAI exists to give everyone — students, founders, creators, and businesses — the tools to compete globally.

Subscribe

Popular

More like this
Related

How Business Ops Teams Boost Productivity with Codex

Discover how business operations teams use Codex to streamline documentation, enhance collaboration, and improve decision-making with AI-powered automation...

OpenAI Partners with Malta to Offer ChatGPT Plus Nationwide

OpenAI and Malta team up to provide free ChatGPT Plus access and AI training to all citizens, promoting digital literacy and responsible AI use.

Critical Linux Kernel Flaw Risks SSH Host Key Theft

A critical Linux kernel flaw risks stolen SSH host keys. Learn how to protect your systems and stay secure until patches are widely available.

Top External Hard Drives 2026: Expert Reviews & Buying Guide

Discover the best external hard drives of 2026 with expert reviews. Find top picks for speed, durability, and security to suit all storage needs.