Top Asynchronous Inference Methods for Vision-Language Models

Date:

Understanding Asynchronous Inference Methods for Vision-Language-Action Models

The emergence of Vision-Language-Action (VLA) models marks a significant advancement in the realm of generalist robot control. However, a pressing challenge that arises with these models is inference latency, which can lead to observation staleness when actions are executed asynchronously. To address this issue, researchers have proposed several innovative methods, including inference-time inpainting (IT-RTC), training-time delay simulation (TT-RTC), future-state-aware conditioning (VLASH), and lightweight residual correction (A2C2). Each of these techniques offers a unique solution to the latency problem, yet they have been evaluated independently, often using different codebases, base policies, and protocols.

This article aims to provide a systematic comparison of these four asynchronous inference methods, exploring their effectiveness under controlled conditions. The research highlights the development of two unified codebases that integrate all four methods with harmonized library and dataset versions, allowing for a more direct comparison of their performance.

Methodologies Overview

  • Inference-Time Inpainting (IT-RTC): This method focuses on reconstructing missing information during inference, which can help mitigate the effects of staleness at lower delays.
  • Training-Time Delay Simulation (TT-RTC): This approach simulates delays during the training phase, ensuring that the model is robust against various delay distributions without adding inference overhead.
  • Future-State-Aware Conditioning (VLASH): By conditioning the model on future states, VLASH attempts to enhance decision-making capabilities, though it presents a trade-off between low and high delay performance.
  • Lightweight Residual Correction (A2C2): This method applies a residual correction at each step, which has proven to be highly effective in maintaining performance across various inference delays.

Benchmarking Results

The study benchmarks these methods on the Kinetix suite using MLPMixer policies and on the LIBERO manipulation benchmark with SmolVLA, sweeping inference delays up to $d=20$ control steps. The results reveal several key insights:

  • A2C2’s Performance: A2C2 emerges as the most effective method on the Kinetix suite, achieving a solve rate above 90% up to $d=8$. It also leads in performance on the LIBERO benchmark starting from $d=4$.
  • IT-RTC Limitations: While IT-RTC is competitive at lower delays, it shows a sharp decline in performance at longer delays ($H=30$) and higher latency.
  • TT-RTC’s Robustness: TT-RTC stands out as the most robust training-based method, remaining stable across different maximum delay choices and generalizing well beyond its training delay distribution.
  • VLASH’s Trade-Off: VLASH’s effectiveness is influenced by the fine-tuning delay range, showcasing a clear trade-off between low and high delay performance.

Conclusion

As the field of robotics continues to evolve, the need for effective asynchronous inference methods becomes increasingly critical. The systematic comparison of IT-RTC, TT-RTC, VLASH, and A2C2 provides valuable insights into their relative strengths and weaknesses. The code developed for this research is available at GitHub, providing a resource for further exploration and development in this exciting area of study.

Related AI Insights

Lazarus Omolua
Lazarus Omoluahttps://richlyai.com/blog
My mission is to make sure that people in Africa are not left behind in the global AI revolution. RichlyAI exists to give everyone — students, founders, creators, and businesses — the tools to compete globally.

Subscribe

Popular

More like this
Related

How Business Ops Teams Boost Productivity with Codex

Discover how business operations teams use Codex to streamline documentation, enhance collaboration, and improve decision-making with AI-powered automation...

OpenAI Partners with Malta to Offer ChatGPT Plus Nationwide

OpenAI and Malta team up to provide free ChatGPT Plus access and AI training to all citizens, promoting digital literacy and responsible AI use.

Critical Linux Kernel Flaw Risks SSH Host Key Theft

A critical Linux kernel flaw risks stolen SSH host keys. Learn how to protect your systems and stay secure until patches are widely available.

Top External Hard Drives 2026: Expert Reviews & Buying Guide

Discover the best external hard drives of 2026 with expert reviews. Find top picks for speed, durability, and security to suit all storage needs.