Can VLMs Truly Forget? Benchmarking Training-Free Visual Concept Unlearning
Summary: arXiv:2604.03114v1 Announce Type: cross
Visual Language Models (VLMs), trained on extensive web-scale datasets, often retain sensitive and copyrighted visual concepts. This retention poses significant challenges when it becomes essential to remove such concepts from deployment. Traditional training-based unlearning methods exhibit a crucial structural flaw: the process of fine-tuning on a narrow forget set can degrade the model’s general capabilities prior to the actual unlearning, thus making it difficult to attribute subsequent performance drops directly to the unlearning procedure itself.
To address this issue, researchers have explored training-free approaches that aim to suppress unwanted concepts through prompts or system instructions. However, until now, there has been a lack of rigorous benchmarks to evaluate these methods in visual tasks. In this context, we introduce VLM-UnBench, the inaugural benchmark designed specifically for training-free visual concept unlearning in VLMs.
Introducing VLM-UnBench
VLM-UnBench provides a structured framework covering:
- Four Forgetting Levels: Different degrees of concept suppression.
- Seven Source Datasets: A diverse range of datasets to assess generalizability.
- Eleven Concept Axes: Various dimensions of visual concepts to evaluate.
The benchmark employs a three-level probe taxonomy combined with five distinct evaluation conditions. This structure is designed to distinguish genuine concept forgetting from mere compliance with instructions. Through comprehensive testing across eight evaluation settings and thirteen VLM configurations, the findings reveal several pivotal insights:
Key Findings
- Forget Accuracy: Realistic unlearning prompts generally maintain forget accuracy close to the baseline established without any instructions.
- Oracle Conditions: Meaningful reductions in retention occur only under oracle conditions, where the target concept is explicitly disclosed to the model.
- Resistance to Suppression: Object and scene concepts demonstrate significant resistance to suppression efforts.
- Model Robustness: Instruction-tuned models exhibit the ability to maintain performance levels even in the presence of explicit forget instructions.
These findings highlight a critical gap between the level of suppression achievable through prompt-based methods and the complete erasure of visual concepts. The results underscore the challenges faced in effectively implementing unlearning processes in VLMs, revealing the limitations of current methodologies in achieving true unlearning.
Conclusion
As the demand for ethical AI continues to grow, understanding and improving the mechanisms of unlearning in VLMs becomes increasingly vital. The introduction of the VLM-UnBench benchmark represents a significant step forward in this research area, providing a foundation for future advancements in training-free visual concept unlearning. Enhanced methodologies will be essential to ensure that VLMs can responsibly manage sensitive and copyrighted content while maintaining their operational efficacy.
