PiJEPA: Advanced Policy-Guided Planning for Visual Navigation

Date:


Policy-Guided World Model Planning for Language-Conditioned Visual Navigation

Summary: arXiv:2603.25981v1 Announce Type: cross

Abstract: Navigating to a visually specified goal given natural language instructions remains a fundamental challenge in embodied AI. Existing approaches either rely on reactive policies that struggle with long-horizon planning, or employ world models that suffer from poor action initialization in high-dimensional spaces. We present PiJEPA, a two-stage framework that combines the strengths of learned navigation policies with latent world model planning for instruction-conditioned visual navigation.

In the first stage, we finetune an Octo-based generalist policy, augmented with a frozen pretrained vision encoder (DINOv2 or V-JEPA-2), on the CAST navigation dataset to produce an informed action distribution conditioned on the current observation and language instruction. In the second stage, we use this policy-derived distribution to warm-start Model Predictive Path Integral (MPPI) planning over a separately trained JEPA world model, which predicts future latent states in the embedding space of the same frozen encoder. By initializing the MPPI sampling distribution from the policy prior rather than from an uninformed Gaussian, our planner converges faster to high-quality action sequences that reach the goal.

We systematically study the effect of the vision encoder backbone, comparing DINOv2 and V-JEPA-2, across both the policy and world model components. Experiments on real-world navigation tasks demonstrate that PiJEPA significantly outperforms both standalone policy execution and uninformed world model planning, achieving improved goal-reaching accuracy and instruction-following fidelity.

Key Features of PiJEPA

  • Two-Stage Framework: Combines learned navigation policies with latent world model planning.
  • Fine-tuning of Generalist Policy: Utilizes an Octo-based generalist policy with a frozen pretrained vision encoder.
  • Warm-Starting MPPI Planning: Enhances Model Predictive Path Integral planning with a policy-derived distribution.
  • Encoder Backbone Comparison: Evaluates the performance of DINOv2 and V-JEPA-2 in both policy and world model components.
  • Real-World Navigation Tasks: Demonstrates significant improvements in goal-reaching accuracy and instruction-following fidelity.

Conclusion

The development of PiJEPA represents a significant advancement in the field of embodied AI, particularly in the realm of visual navigation guided by natural language instructions. By effectively integrating learned policies and world models, PiJEPA not only addresses the limitations of existing approaches but also sets a new benchmark for future research in this area.


Related AI Insights

Lazarus Omolua
Lazarus Omoluahttps://richlyai.com/blog
My mission is to make sure that people in Africa are not left behind in the global AI revolution. RichlyAI exists to give everyone — students, founders, creators, and businesses — the tools to compete globally.

Subscribe

Popular

More like this
Related

How Business Ops Teams Boost Productivity with Codex

Discover how business operations teams use Codex to streamline documentation, enhance collaboration, and improve decision-making with AI-powered automation...

OpenAI Partners with Malta to Offer ChatGPT Plus Nationwide

OpenAI and Malta team up to provide free ChatGPT Plus access and AI training to all citizens, promoting digital literacy and responsible AI use.

Critical Linux Kernel Flaw Risks SSH Host Key Theft

A critical Linux kernel flaw risks stolen SSH host keys. Learn how to protect your systems and stay secure until patches are widely available.

Top External Hard Drives 2026: Expert Reviews & Buying Guide

Discover the best external hard drives of 2026 with expert reviews. Find top picks for speed, durability, and security to suit all storage needs.