Multi-ORFT: Stable Online Reinforcement for Multi-Agent Driving

Date:


Multi-ORFT: Stable Online Reinforcement Fine-Tuning for Multi-Agent Diffusion Planning in Cooperative Driving

Summary: arXiv:2604.11734v2 Announce Type: replace-cross

Abstract: Closed-loop cooperative driving requires planners that generate realistic multimodal multi-agent trajectories while improving safety and traffic efficiency. Existing diffusion planners can model multimodal behaviors from demonstrations, but they often exhibit weak scene consistency and remain poorly aligned with closed-loop objectives; meanwhile, stable online post-training in reactive multi-agent environments remains difficult.

We present Multi-ORFT, which couples scene-conditioned diffusion pre-training with stable online reinforcement post-training. In pre-training, the planner uses inter-agent self-attention, cross-attention, and AdaLN-Zero-based scene conditioning to improve scene consistency and road adherence of joint trajectories.

Key Features of Multi-ORFT

  • Inter-Agent Self-Attention: Enhances the interaction and coordination among multiple agents.
  • Cross-Attention: Allows the model to consider inputs from various sources, improving decision-making.
  • AdaLN-Zero Scene Conditioning: Ensures that the generated trajectories are contextually relevant to the driving environment.

In post-training, we formulate a two-level Markov Decision Process (MDP) that exposes step-wise reverse-kernel likelihoods for online optimization. We combine dense trajectory-level rewards with variance-gated group-relative policy optimization (VG-GRPO) to stabilize training.

Performance Results

On the WOMD closed-loop benchmark, Multi-ORFT achieved the following improvements:

  • Reduced collision rate from 2.04% to 1.89%.
  • Decreased off-road rate from 1.68% to 1.36%.
  • Increased average speed from 8.36 to 8.61 m/s relative to the pre-trained planner.

Moreover, Multi-ORFT outperformed several strong open-source baselines, including:

  • SMART-large
  • SMART-tiny-CLSFT
  • VBD

Conclusion

The results demonstrate that coupling scene-consistent denoising with stable online diffusion-policy optimization significantly enhances the reliability of closed-loop cooperative driving. Multi-ORFT not only improves safety metrics but also contributes to better traffic efficiency, making it a promising advance in the field of autonomous driving and multi-agent systems.


Related AI Insights

Lazarus Omolua
Lazarus Omoluahttps://richlyai.com/blog
My mission is to make sure that people in Africa are not left behind in the global AI revolution. RichlyAI exists to give everyone — students, founders, creators, and businesses — the tools to compete globally.

Subscribe

Popular

More like this
Related

How Business Ops Teams Boost Productivity with Codex

Discover how business operations teams use Codex to streamline documentation, enhance collaboration, and improve decision-making with AI-powered automation...

OpenAI Partners with Malta to Offer ChatGPT Plus Nationwide

OpenAI and Malta team up to provide free ChatGPT Plus access and AI training to all citizens, promoting digital literacy and responsible AI use.

Critical Linux Kernel Flaw Risks SSH Host Key Theft

A critical Linux kernel flaw risks stolen SSH host keys. Learn how to protect your systems and stay secure until patches are widely available.

Top External Hard Drives 2026: Expert Reviews & Buying Guide

Discover the best external hard drives of 2026 with expert reviews. Find top picks for speed, durability, and security to suit all storage needs.