Feature Attribution Stability Suite: Evaluating Post-Hoc Methods

Feature Attribution Stability Suite: How Stable Are Post-Hoc Attributions?

Summary: arXiv:2604.02532v1 Announce Type: cross

Abstract: Post-hoc feature attribution methods are widely deployed in safety-critical vision systems, yet their stability under realistic input perturbations remains poorly characterized. Existing metrics evaluate explanations primarily under additive noise, collapse stability to a single scalar, and fail to condition on prediction preservation, conflating explanation fragility with model sensitivity.

We introduce the Feature Attribution Stability Suite (FASS), a benchmark that enforces prediction-invariance filtering, decomposes stability into three complementary metrics: structural similarity, rank correlation, and top-k Jaccard overlap—and evaluates across geometric, photometric, and compression perturbations. Evaluating four attribution methods (Integrated Gradients, GradientSHAP, Grad-CAM, LIME) across four architectures and three datasets—ImageNet-1K, MS COCO, and CIFAR-10—FASS shows that stability estimates depend critically on perturbation family and prediction-invariance filtering. Geometric perturbations expose substantially greater attribution instability than photometric changes, and without conditioning on prediction preservation, up to 99% of evaluated pairs involve changed predictions. Under this controlled evaluation, we observe consistent method-level trends, with Grad-CAM achieving the highest stability across datasets.

Key Features of the Feature Attribution Stability Suite

Prediction-Invariance Filtering: This feature ensures that the stability metrics are evaluated while maintaining consistent model predictions, providing a more accurate assessment of attribution methods.
Decomposed Stability Metrics: FASS breaks down stability into three distinct metrics:
- Structural Similarity: Measures how similar the attributions are structurally across different perturbations.
- Rank Correlation: Assesses the consistency of rank orders of features across perturbations.
- Top-k Jaccard Overlap: Evaluates the overlap of the top-k most significant features between different perturbations.
Diverse Perturbation Types: The benchmark evaluates attributions across various perturbations, including:
- Geometric perturbations (e.g., rotations, translations)
- Photometric perturbations (e.g., brightness, contrast changes)
- Compression perturbations (e.g., JPEG compression)

Findings from the Evaluation

The evaluation of attribution methods using FASS revealed significant insights:

Geometric perturbations led to a notably higher instability in attributions compared to photometric changes.
Without conditioning on prediction preservation, a staggering 99% of the evaluated pairs showed discrepancies in predictions.
Grad-CAM consistently demonstrated the highest level of stability across the evaluated datasets, suggesting its robustness in practical applications.

In conclusion, the Feature Attribution Stability Suite provides a comprehensive framework for evaluating the stability of post-hoc feature attribution methods, highlighting important trends and considerations for future research in the field of explainable AI.

RichlyAI Blog AI Guide, Tutorials, Industrial Insights, & more!

Company

Feature Attribution Stability Suite: Evaluating Post-Hoc Methods

Feature Attribution Stability Suite: How Stable Are Post-Hoc Attributions?

Key Features of the Feature Attribution Stability Suite

Findings from the Evaluation

Related AI Insights

Subscribe

More like thisRelated

About us

Company

The latest

Subscribe

RichlyAI Blog
AI Guide, Tutorials, Industrial Insights, & more!

More like this
Related