URMF: Robust Multimodal Fusion for Sarcasm Detection

Date:

URMF: Uncertainty-aware Robust Multimodal Fusion for Multimodal Sarcasm Detection

Summary: arXiv:2604.06728v1 Announce Type: cross

Abstract

Multimodal sarcasm detection (MSD) aims to identify sarcastic intent from semantic incongruity between text and image. Although recent methods have improved MSD through cross-modal interaction and incongruity reasoning, they often assume that all modalities are equally reliable. In real-world social media, however, textual content may be ambiguous and visual content may be weakly relevant or even irrelevant, causing deterministic fusion to introduce noisy evidence and weaken robust reasoning.

Introduction

To address the challenges in multimodal sarcasm detection, we propose Uncertainty-aware Robust Multimodal Fusion (URMF), a unified framework that explicitly models modality reliability during interaction and fusion. This innovative approach recognizes the inherent uncertainties in both textual and visual modalities, which can significantly impact the effectiveness of sarcasm detection.

Key Features of URMF

  • Multi-Head Cross-Attention: URMF first employs multi-head cross-attention to inject visual evidence into textual representations, enhancing the interaction between different modalities.
  • Incongruity-Aware Reasoning: The framework utilizes multi-head self-attention in the fused semantic space to bolster incongruity-aware reasoning, allowing for a more nuanced understanding of sarcasm.
  • Aleatoric Uncertainty Modeling: URMF performs unified unimodal aleatoric uncertainty modeling over text, image, and interaction-aware latent representations. Each modality is parameterized as a learnable Gaussian posterior, which enables the model to account for variability in the data.
  • Dynamic Modality Regulation: The estimated uncertainty is utilized to dynamically regulate modality contributions during fusion, effectively suppressing unreliable modalities and yielding a more robust joint representation.
  • Joint Training Objective: A comprehensive joint training objective integrates task supervision, modality prior regularization, cross-modal distribution alignment, and uncertainty-driven self-sampling contrastive learning, ensuring a well-rounded training process.

Experimental Results

Experiments conducted on publicly available MSD benchmarks demonstrate that URMF consistently outperforms strong unimodal, multimodal, and MLLM-based baselines. The results showcase the framework’s effectiveness in improving both accuracy and robustness in sarcasm detection tasks.

Conclusion

URMF represents a significant advancement in the field of multimodal sarcasm detection by addressing the critical issue of modality reliability. By incorporating uncertainty-aware fusion techniques, URMF not only enhances the accuracy of sarcasm detection but also provides a more resilient framework capable of handling the complexities of real-world social media content. As the landscape of communication continues to evolve, models like URMF will be essential for accurately interpreting nuanced human interactions.


Related AI Insights

Lazarus Omolua
Lazarus Omoluahttps://richlyai.com/blog
My mission is to make sure that people in Africa are not left behind in the global AI revolution. RichlyAI exists to give everyone — students, founders, creators, and businesses — the tools to compete globally.

Subscribe

Popular

More like this
Related

How Business Ops Teams Boost Productivity with Codex

Discover how business operations teams use Codex to streamline documentation, enhance collaboration, and improve decision-making with AI-powered automation...

OpenAI Partners with Malta to Offer ChatGPT Plus Nationwide

OpenAI and Malta team up to provide free ChatGPT Plus access and AI training to all citizens, promoting digital literacy and responsible AI use.

Critical Linux Kernel Flaw Risks SSH Host Key Theft

A critical Linux kernel flaw risks stolen SSH host keys. Learn how to protect your systems and stay secure until patches are widely available.

Top External Hard Drives 2026: Expert Reviews & Buying Guide

Discover the best external hard drives of 2026 with expert reviews. Find top picks for speed, durability, and security to suit all storage needs.