VOLTA: The Surprising Ineffectiveness of Auxiliary Losses for Calibrated Deep Learning
Uncertainty quantification (UQ) is a critical aspect of deploying deep learning models, particularly in safety-critical applications. However, there has been no consensus on which UQ method performs optimally across various data modalities and distribution shifts. A new paper titled “VOLTA: The Surprising Ineffectiveness of Auxiliary Losses for Calibrated Deep Learning” presents a comprehensive benchmark of ten widely used UQ baselines and introduces a simplified version of VOLTA that demonstrates remarkable effectiveness.
Key Findings
The study benchmarks ten established UQ methods, including:
- MC Dropout
- SWAG
- Ensemble Methods
- Temperature Scaling
- Energy Based OOD
- Mahalanobis Distance
- Hyperbolic Classifiers
- ENN (Ensemble Nearest Neighbors)
- Taylor Sensus
- Split Conformal Prediction
These methods were evaluated against a streamlined variant of VOLTA that incorporates a deep encoder, learnable prototypes, cross-entropy loss, and post hoc temperature scaling.
Performance Metrics
The evaluation of UQ methods covered multiple datasets, including:
- CIFAR 10 (in distribution)
- CIFAR 100
- SVHN
- Uniform Noise (out of distribution)
- CIFAR 10 C (corruptions)
- Tiny ImageNet features (tabular)
Notably, VOLTA achieved competitive or superior accuracy of up to 0.864 on CIFAR 10, along with significantly lower expected calibration error—0.010 compared to 0.044 to 0.102 for the baseline methods. Additionally, VOLTA demonstrated strong out-of-distribution (OOD) detection, achieving an area under the receiver operating characteristic curve (AUROC) of 0.802.
Statistical Validation
Statistical testing conducted over three random seeds indicated that VOLTA matches or outperforms most of the baseline methods. Furthermore, ablation studies reaffirmed the importance of adaptive temperature and deep encoders in enhancing performance.
Conclusion
The results from this study establish VOLTA as a lightweight, deterministic, and well-calibrated alternative to more complex UQ approaches. By demonstrating that auxiliary losses may not be as beneficial as previously thought, this research opens up new avenues for developing efficient and effective deep learning models for critical applications.
