Perception-Grounded Policy Optimization for Vision-Language Models

Date:

Not All Tokens See Equally: Perception-Grounded Policy Optimization for Large Vision-Language Models

Summary: arXiv:2604.01840v2 Announce Type: replace

Abstract

While Reinforcement Learning from Verifiable Rewards (RLVR) has advanced reasoning in Large Vision-Language Models (LVLMs), prevailing frameworks suffer from a foundational methodological flaw: by distributing identical advantages across all generated tokens, these methods inherently dilute the learning signals essential for optimizing the critical, visually-grounded steps of multimodal reasoning.

Introduction

In recent years, the integration of vision and language in artificial intelligence has opened new avenues for research and development. Large Vision-Language Models (LVLMs) have demonstrated impressive capabilities in reasoning tasks that require understanding both visual and textual information. However, traditional reinforcement learning approaches have limitations that hinder their effectiveness.

The Challenge

The primary challenge with existing frameworks lies in their uniform distribution of rewards across all tokens generated during the learning process. This method not only dilutes the learning signals but also obscures the critical visually-grounded reasoning that is crucial for effective multimodal understanding.

Introducing Token Visual Dependency

To address this issue, we present the concept of Token Visual Dependency. This metric quantifies the causal information gain from visual inputs, leveraging the Kullback-Leibler (KL) divergence to compare visual-conditioned predictive distributions against text-only distributions. Our findings reveal that this dependency is not only sparse but also semantically significant.

Perception-Grounded Policy Optimization (PGPO)

Building on the insights gained from Token Visual Dependency, we introduce a novel framework called Perception-Grounded Policy Optimization (PGPO). This fine-grained credit assignment mechanism dynamically adjusts the advantages at the token level. By employing a threshold-gated, mass-conserving approach, PGPO enhances learning signals for tokens that are visually dependent while mitigating gradient noise originating from linguistic priors.

Experimental Validation

We conducted extensive experiments using the Qwen2.5-VL series across seven challenging multimodal reasoning benchmarks. The results demonstrate that PGPO significantly boosts model performance, achieving an average increase of 18.7%. Both theoretical and empirical analyses confirm that PGPO effectively reduces gradient variance, prevents training collapse, and serves as a robust regularizer for perception-grounded multimodal reasoning.

Conclusion

The advancements introduced through Perception-Grounded Policy Optimization represent a significant step forward in the optimization of Large Vision-Language Models. By focusing on the unique visual dependencies of tokens, we can ensure more effective learning and enhanced performance in multimodal reasoning tasks.

Code Availability

The code for implementing PGPO will be made available at GitHub – PGPO.


Related AI Insights

Lazarus Omolua
Lazarus Omoluahttps://richlyai.com/blog
My mission is to make sure that people in Africa are not left behind in the global AI revolution. RichlyAI exists to give everyone — students, founders, creators, and businesses — the tools to compete globally.

Subscribe

Popular

More like this
Related

How Business Ops Teams Boost Productivity with Codex

Discover how business operations teams use Codex to streamline documentation, enhance collaboration, and improve decision-making with AI-powered automation...

OpenAI Partners with Malta to Offer ChatGPT Plus Nationwide

OpenAI and Malta team up to provide free ChatGPT Plus access and AI training to all citizens, promoting digital literacy and responsible AI use.

Critical Linux Kernel Flaw Risks SSH Host Key Theft

A critical Linux kernel flaw risks stolen SSH host keys. Learn how to protect your systems and stay secure until patches are widely available.

Top External Hard Drives 2026: Expert Reviews & Buying Guide

Discover the best external hard drives of 2026 with expert reviews. Find top picks for speed, durability, and security to suit all storage needs.