Decoding by Perturbation: Reducing MLLM Hallucinations

Date:

Decoding by Perturbation: Mitigating MLLM Hallucinations via Dynamic Textual Perturbation

Summary: arXiv:2604.12424v1 Announce Type: cross

Abstract

Multimodal Large Language Models (MLLMs) frequently suffer from inference hallucinations, which are largely attributed to language priors overshadowing visual evidence. Traditional training-free mitigation methods either compromise visual representation by perturbing it beyond natural image distributions or impose intrusive manipulations that undermine the model’s generative fluency. This article presents a novel perspective, suggesting that multimodal hallucinations arise from the hypersensitivity of visual grounding to textual phrasing during the decoding phase.

Introduction

With the advent of MLLMs, the integration of visual and textual data has opened new avenues for natural language processing and computer vision. However, the challenge of hallucinations—where the model generates information that is not present in the input data—remains a significant hurdle. Existing methods often lead to either over-perturbed visual inputs or the introduction of non-naturally occurring artifacts, which can detract from the model’s overall performance.

Proposed Framework: Decoding by Perturbation (DeP)

In light of the aforementioned challenges, we introduce Decoding by Perturbation (DeP), a training-free framework designed to mitigate prior-induced hallucinations through controlled textual interventions. This innovative approach is predicated on the observation that the hallucinations are influenced by the sensitivity of visual grounding to the specific phrasing of text inputs.

Key Features of DeP

  • Dynamic Probing: DeP utilizes a dynamic probe that applies multi-level textual perturbations, effectively eliciting latent language priors without altering the visual input significantly.
  • Attention Variance: By leveraging attention variance, DeP enhances stable regions of evidence while suppressing noise within the feature space, leading to improved model reliability.
  • Interpretable Prior Drift Direction: The framework constructs a direction for prior drift based on logits statistics, allowing for the counteraction of probability biases stemming from textual co-occurrences.

Experimental Results

Extensive experiments across multiple benchmarks were conducted to evaluate the effectiveness of DeP. Results indicate that the framework significantly reduces hallucinations and enhances the model’s performance in generating coherent, contextually relevant outputs. The ability of DeP to maintain generative fluency while mitigating the influence of erroneous textual biases represents a pivotal advancement in the field.

Conclusion

In summary, Decoding by Perturbation offers a promising solution to the pervasive issue of hallucinations in MLLMs. By focusing on the interplay between textual phrasing and visual grounding, this approach effectively balances the need for generative fluency with the necessity of accuracy in multimodal outputs. Future research may delve deeper into refining the perturbation techniques and exploring their application across different multimodal tasks.

Related AI Insights

Lazarus Omolua
Lazarus Omoluahttps://richlyai.com/blog
My mission is to make sure that people in Africa are not left behind in the global AI revolution. RichlyAI exists to give everyone — students, founders, creators, and businesses — the tools to compete globally.

Subscribe

Popular

More like this
Related

How Business Ops Teams Boost Productivity with Codex

Discover how business operations teams use Codex to streamline documentation, enhance collaboration, and improve decision-making with AI-powered automation...

OpenAI Partners with Malta to Offer ChatGPT Plus Nationwide

OpenAI and Malta team up to provide free ChatGPT Plus access and AI training to all citizens, promoting digital literacy and responsible AI use.

Critical Linux Kernel Flaw Risks SSH Host Key Theft

A critical Linux kernel flaw risks stolen SSH host keys. Learn how to protect your systems and stay secure until patches are widely available.

Top External Hard Drives 2026: Expert Reviews & Buying Guide

Discover the best external hard drives of 2026 with expert reviews. Find top picks for speed, durability, and security to suit all storage needs.