Edge Reliability Challenges in Compressed Vision-Language Models

Date:

Edge Reliability Gap in Vision-Language Models: Quantifying Failure Modes of Compressed VLMs Under Visual Corruption

The rapid compression of large vision-language models (VLMs) for edge deployment raises an underexplored question: do compact models fail differently, not merely more often? A recent study published on arXiv (arXiv:2603.26769v1) sheds light on this critical issue by comparing the performance of two distinct VLMs when subjected to visual corruption.

The study contrasts a 7-billion-parameter quantised VLM, known as Qwen2.5-VL-7B, which operates on a 4-bit NF4 quantization, with a smaller 500-million-parameter FP16 model named SmolVLM2-500M. The evaluation is based on a comprehensive dataset comprising 4,000 samples drawn from VQAv2 and COCO Captions, providing a robust foundation for comparative analysis.

Methodology

The researchers employed a three-category error taxonomy to diagnose the failure modes of these models. The categories include:

  • Object Blindness: The model fails to recognize objects present in the visual input.
  • Semantic Drift: The model fails to maintain the intended meaning of the input text in relation to the visual content.
  • Prior Bias: The model’s responses are influenced by preconceived notions rather than the actual content.

Findings

Utilizing a text-only GPT-4o judge, the study identified Semantic Drift as the predominant failure mode for Qwen on both VQAv2 and COCO Captions. In contrast, SmolVLM2 exhibited a mixed profile of Object Blindness and Semantic Drift on COCO, while Prior Bias was noted on VQAv2 but absent on COCO for both models.

Furthermore, the researchers assessed confidence calibration via Expected Calibration Error (ECE) using geometric mean token probability. They also probed compositional reasoning with structured negation probes across four templates, leading to a blur robustness experiment as part of the evaluation process.

Results

The compact model, SmolVLM2-500M, displayed a qualitatively distinct failure signature. It exhibited a 12.5 percentage point larger negation collapse compared to Qwen2.5-VL-7B (-33.2pp vs. -20.8pp), with the discrepancy largely influenced by COCO trials. Interestingly, the VQAv2 gap did not reach statistical significance (4.5pp, p=0.19).

The most notable template, false_yn, demonstrated a stark contrast between the two models. SmolVLM2-500M incorrectly responded “Yes” on 100% of COCO trials, erroneously claiming that a depicted object was absent, whereas Qwen2.5-VL-7B had an error rate of only 14% for the same trials.

Conclusion

This study highlights the importance of understanding the distinct failure modes of compressed vision-language models, particularly as they transition to edge deployment. The findings suggest that smaller models may not only fail more frequently but do so in qualitatively different ways. The researchers have also released a fully reproducible pipeline aimed at facilitating systematic safety auditing of compressed VLMs prior to their deployment in real-world applications.


Related AI Insights

Lazarus Omolua
Lazarus Omoluahttps://richlyai.com/blog
My mission is to make sure that people in Africa are not left behind in the global AI revolution. RichlyAI exists to give everyone — students, founders, creators, and businesses — the tools to compete globally.

Subscribe

Popular

More like this
Related

How Business Ops Teams Boost Productivity with Codex

Discover how business operations teams use Codex to streamline documentation, enhance collaboration, and improve decision-making with AI-powered automation...

OpenAI Partners with Malta to Offer ChatGPT Plus Nationwide

OpenAI and Malta team up to provide free ChatGPT Plus access and AI training to all citizens, promoting digital literacy and responsible AI use.

Critical Linux Kernel Flaw Risks SSH Host Key Theft

A critical Linux kernel flaw risks stolen SSH host keys. Learn how to protect your systems and stay secure until patches are widely available.

Top External Hard Drives 2026: Expert Reviews & Buying Guide

Discover the best external hard drives of 2026 with expert reviews. Find top picks for speed, durability, and security to suit all storage needs.