Stabilizing MLLM Self-Evolution with Softened Retracing

Date:

Stabilizing Unsupervised Self-Evolution of MLLMs via Continuous Softened Retracing reSampling

Summary: arXiv:2604.03647v2 Announce Type: replace-cross

Abstract

In the unsupervised self-evolution of Multimodal Large Language Models (MLLMs), the quality of feedback signals during post-training is pivotal for stable and effective learning. However, existing self-evolution methods predominantly rely on majority voting to select the most frequent output as the pseudo-golden answer, which may stem from the model’s intrinsic biases rather than guaranteeing the objective correctness of the reasoning paths. To counteract this degradation, we propose Continuous Softened Retracing reSampling (CSRS) in MLLM self-evolution.

Introduction

The field of artificial intelligence has made significant strides with the development of Multimodal Large Language Models (MLLMs). These models, which integrate various types of data inputs, are increasingly utilized in a wide range of applications. However, for these models to evolve effectively in an unsupervised manner, it is essential to enhance the quality of feedback during their post-training phase.

Challenges in Current Methods

Current self-evolution methodologies primarily depend on a majority voting system to determine the most frequent output as the pseudo-golden answer. This approach can lead to several issues:

  • Intrinsic Biases: The reliance on majority voting may reinforce existing biases within the model.
  • Uncertainty in Correctness: Selecting outputs based on frequency does not guarantee the correctness of reasoning paths.
  • Limited Exploration: Traditional methods may restrict the model’s ability to explore diverse reasoning paths.

Proposed Solution: Continuous Softened Retracing reSampling (CSRS)

To address these challenges, we introduce Continuous Softened Retracing reSampling (CSRS) for MLLM self-evolution. Our approach consists of two main components:

  • Retracing Re-inference Mechanism (RRM): This mechanism allows the model to re-inference from anchor points, expanding the exploration of long-tail reasoning paths. By revisiting these anchor points, the model can uncover more nuanced and less frequent reasoning patterns.
  • Softened Frequency Reward (SFR): In place of binary rewards, SFR utilizes continuous signals that calibrate rewards based on the frequency of answers across sampled reasoning sets, ensuring a more nuanced feedback mechanism.

Incorporating Visual Semantic Perturbation

Moreover, CSRS incorporates Visual Semantic Perturbation (VSP) to ensure that the model prioritizes mathematical logic over visual superficiality. By focusing on the underlying logic rather than merely surface-level visual cues, the performance of the model is enhanced significantly.

Experimental Results

Our experiments demonstrate that the CSRS framework significantly enhances the reasoning performance of the Qwen2.5-VL-7B model on various benchmarks, including MathVision. The results indicate that our approach achieves state-of-the-art (SOTA) outcomes in unsupervised self-evolution, particularly in geometric tasks.

Conclusion

In conclusion, the Continuous Softened Retracing reSampling technique presents a promising advancement in the self-evolution of Multimodal Large Language Models. By addressing existing challenges through innovative mechanisms, CSRS not only improves the quality of reasoning paths but also sets new benchmarks for performance in complex reasoning tasks. Our code is available at GitHub.


Related AI Insights

Lazarus Omolua
Lazarus Omoluahttps://richlyai.com/blog
My mission is to make sure that people in Africa are not left behind in the global AI revolution. RichlyAI exists to give everyone — students, founders, creators, and businesses — the tools to compete globally.

Subscribe

Popular

More like this
Related

How Business Ops Teams Boost Productivity with Codex

Discover how business operations teams use Codex to streamline documentation, enhance collaboration, and improve decision-making with AI-powered automation...

OpenAI Partners with Malta to Offer ChatGPT Plus Nationwide

OpenAI and Malta team up to provide free ChatGPT Plus access and AI training to all citizens, promoting digital literacy and responsible AI use.

Critical Linux Kernel Flaw Risks SSH Host Key Theft

A critical Linux kernel flaw risks stolen SSH host keys. Learn how to protect your systems and stay secure until patches are widely available.

Top External Hard Drives 2026: Expert Reviews & Buying Guide

Discover the best external hard drives of 2026 with expert reviews. Find top picks for speed, durability, and security to suit all storage needs.