Empirical Characterization of Rationale Stability Under Controlled Perturbations for Explainable Pattern Recognition
Summary: arXiv:2604.04456v1 Announce Type: new
Abstract
Reliable pattern recognition systems should exhibit consistent behavior across similar inputs, and their explanations should remain stable. However, most Explainable AI evaluations remain instance-centric and do not explicitly quantify whether attribution patterns are consistent across samples that share the same class or represent small variations of the same input.
Introduction
In the realm of artificial intelligence, particularly in Explainable AI (XAI), the need for reliability and consistency in pattern recognition systems is paramount. This article discusses a novel metric designed to assess the consistency of model explanations, focusing specifically on label-preserving perturbations.
Proposed Metric and Methodology
We introduce a metric aimed at quantifying the stability of model explanations. This is accomplished by implementing the metric using a pre-trained BERT model on the SST-2 sentiment analysis dataset. Additional robustness tests are conducted using RoBERTa, DistilBERT, and IMDB. SHAP (SHapley Additive exPlanations) is employed to compute feature importance across various test samples.
Key Components of the Methodology:
- Cosine Similarity: The proposed metric quantifies the cosine similarity of SHAP values for inputs sharing the same label.
- Detection of Inconsistencies: The metric aims to identify inconsistent behaviors, such as biased reliance on specific features or failure to maintain consistent reasoning for similar predictions.
- Experimental Evaluation: A series of experiments are conducted to evaluate the effectiveness of this metric in identifying misaligned predictions and inconsistencies in model explanations.
Results and Comparison
The experiments reveal that the proposed metric effectively identifies when a model’s behavior deviates from its intended objectives. Comparisons against standard fidelity metrics demonstrate that this new metric provides a more nuanced perspective on model behavior, offering insights that traditional methods may overlook.
Significance of the Findings
This framework enhances the understanding of model behavior by enabling a more robust verification of rationale stability. The ability to quantify whether models rely on consistent attribution patterns for similar inputs is crucial for building trustworthy AI systems.
Conclusion
The findings underscore the importance of consistency in explainable AI, particularly in applications involving pattern recognition. By offering a metric that assesses consistency under controlled perturbations, we pave the way for more effective evaluations of model behavior, ultimately contributing to the development of reliable AI systems.
Availability
The code for the proposed metric and methodology is publicly available at the following repository: https://github.com/anmspro/ESS-XAI-Stability.
