Multi-ORFT: Stable Online Reinforcement for Multi-Agent Driving

Multi-ORFT: Stable Online Reinforcement Fine-Tuning for Multi-Agent Diffusion Planning in Cooperative Driving

Summary: arXiv:2604.11734v2 Announce Type: replace-cross

Abstract: Closed-loop cooperative driving requires planners that generate realistic multimodal multi-agent trajectories while improving safety and traffic efficiency. Existing diffusion planners can model multimodal behaviors from demonstrations, but they often exhibit weak scene consistency and remain poorly aligned with closed-loop objectives; meanwhile, stable online post-training in reactive multi-agent environments remains difficult.

We present Multi-ORFT, which couples scene-conditioned diffusion pre-training with stable online reinforcement post-training. In pre-training, the planner uses inter-agent self-attention, cross-attention, and AdaLN-Zero-based scene conditioning to improve scene consistency and road adherence of joint trajectories.

Key Features of Multi-ORFT

Inter-Agent Self-Attention: Enhances the interaction and coordination among multiple agents.
Cross-Attention: Allows the model to consider inputs from various sources, improving decision-making.
AdaLN-Zero Scene Conditioning: Ensures that the generated trajectories are contextually relevant to the driving environment.

In post-training, we formulate a two-level Markov Decision Process (MDP) that exposes step-wise reverse-kernel likelihoods for online optimization. We combine dense trajectory-level rewards with variance-gated group-relative policy optimization (VG-GRPO) to stabilize training.

Performance Results

On the WOMD closed-loop benchmark, Multi-ORFT achieved the following improvements:

Reduced collision rate from 2.04% to 1.89%.
Decreased off-road rate from 1.68% to 1.36%.
Increased average speed from 8.36 to 8.61 m/s relative to the pre-trained planner.

Moreover, Multi-ORFT outperformed several strong open-source baselines, including:

SMART-large
SMART-tiny-CLSFT
VBD

Conclusion

The results demonstrate that coupling scene-consistent denoising with stable online diffusion-policy optimization significantly enhances the reliability of closed-loop cooperative driving. Multi-ORFT not only improves safety metrics but also contributes to better traffic efficiency, making it a promising advance in the field of autonomous driving and multi-agent systems.

RichlyAI Blog AI Guide, Tutorials, Industrial Insights, & more!

Company

Multi-ORFT: Stable Online Reinforcement for Multi-Agent Driving

Multi-ORFT: Stable Online Reinforcement Fine-Tuning for Multi-Agent Diffusion Planning in Cooperative Driving

Key Features of Multi-ORFT

Performance Results

Conclusion

Related AI Insights

Subscribe

More like thisRelated

About us

Company

The latest

Subscribe

RichlyAI Blog
AI Guide, Tutorials, Industrial Insights, & more!

More like this
Related