GF-Score: Certified Class-Conditional Robustness Evaluation with Fairness Guarantees
Summary: arXiv:2604.12757v1 Announce Type: cross
Abstract: Adversarial robustness is essential for deploying neural networks in safety-critical applications, yet standard evaluation methods either require expensive adversarial attacks or report only a single aggregate score that obscures how robustness is distributed across classes. We introduce the GF-Score (GREAT-Fairness Score), a framework that decomposes the certified GREAT Score into per-class robustness profiles and quantifies their disparity through four metrics grounded in welfare economics: the Robustness Disparity Index (RDI), the Normalized Robustness Gini Coefficient (NRGC), Worst-Case Class Robustness (WCR), and a Fairness-Penalized GREAT Score (FP-GREAT). The framework further eliminates the original method’s dependence on adversarial attacks through a self-calibration procedure that tunes the temperature parameter using only clean accuracy correlations. Evaluating 22 models from RobustBench across CIFAR-10 and ImageNet, we find that the decomposition is exact, that per-class scores reveal consistent vulnerability patterns (e.g., “cat” is the weakest class in 76% of CIFAR-10 models), and that more robust models tend to exhibit greater class-level disparity. These results establish a practical, attack-free auditing pipeline for diagnosing where certified robustness guarantees fail to protect all classes equally. We release our code on GitHub.
Introduction
The increasing reliance on neural networks in safety-critical applications necessitates a thorough evaluation of their adversarial robustness. Traditional methods of assessment often involve costly adversarial attacks or present aggregate scores that mask the individual performance across different classes. The introduction of the GF-Score aims to address these shortcomings, offering a clearer and more equitable analysis of model robustness.
Key Features of the GF-Score Framework
- Per-Class Robustness Profiles: The GF-Score provides a detailed breakdown of robustness across various classes, allowing researchers and practitioners to identify specific vulnerabilities.
- Robustness Disparity Metrics: The framework includes four distinct metrics to quantify disparity in robustness:
- Robustness Disparity Index (RDI)
- Normalized Robustness Gini Coefficient (NRGC)
- Worst-Case Class Robustness (WCR)
- Fairness-Penalized GREAT Score (FP-GREAT)
- Attack-Free Evaluation: GF-Score eliminates the need for adversarial attacks by employing a self-calibration procedure that utilizes clean accuracy correlations to tune its parameters.
Findings and Implications
In evaluating 22 models from RobustBench on the CIFAR-10 and ImageNet datasets, notable findings emerged:
- The decomposition of the GREAT Score is confirmed to be exact.
- Per-class robustness scores expose consistent vulnerability patterns; for instance, the “cat” class was identified as the weakest in 76% of the CIFAR-10 models examined.
- Robust models tend to show higher levels of class-level disparity, indicating a need for more balanced approaches in model training and evaluation.
Conclusion
The GF-Score presents a significant advancement in the evaluation of adversarial robustness, particularly in ensuring fairness across different classes. By providing a comprehensive and attack-free auditing pipeline, it enables practitioners to better understand and mitigate vulnerabilities in their models. The release of the accompanying code on GitHub further encourages community engagement and exploration of this innovative framework.
