Uncertainty Makes It Stable: Curiosity-Driven Quantized Mixture-of-Experts
Summary: arXiv:2511.11743v3 Announce Type: replace-cross
Abstract
Deploying deep neural networks on resource-constrained devices faces two critical challenges: maintaining accuracy under aggressive quantization while ensuring predictable inference latency. We present a curiosity-driven quantized Mixture-of-Experts framework that addresses both through Bayesian epistemic uncertainty-based routing across heterogeneous experts (BitNet ternary, 1-16 bit BitLinear, post-training quantization).
Key Findings
Our framework has been evaluated on various audio classification benchmarks, including:
- ESC-50
- Quinn
- UrbanSound8K
Notably, our 4-bit quantization maintains 99.9 percent of full-precision F1 score (0.858 vs 0.859) while achieving 4x compression and a remarkable 31 percent energy savings compared to 8-bit systems. Both the 4-bit and 8-bit configurations achieve statistical parity with full precision (p > 0.05).
Curiosity-Driven Routing
Crucially, our curiosity-driven routing mechanism simultaneously enhances both accuracy and stability. For instance, on the Quinn dataset, the F1 score improves from 0.802 to 0.809, while cross-fold variance drops by an impressive 85 percent (p < 0.001, Levene's test). This trend is consistent across datasets, with variance reductions ranging from 50 to 94 percent.
The routing mechanism is self-organizing, allowing the high-precision 8-bit expert to automatically receive the most uncertain samples, which have a 20 percent lower confidence (p < 0.001). Conversely, lightweight experts are tasked with handling easier inputs. This targeted approach ensures that datasets with already low baseline variance do not experience artificial stability gains, confirming that the mechanism is focused on addressing genuine epistemic uncertainty rather than overfitting routing decisions.
Interpretable and Precision-Aware Framework
With a compact architecture of 1.2 million parameters, our framework delivers interpretable and precision-aware routing that is particularly suitable for safety-sensitive edge deployments. In these contexts, both accuracy and predictability are paramount.
Conclusion
In summary, the curiosity-driven quantized Mixture-of-Experts framework showcases a promising approach to overcoming the challenges associated with deploying deep neural networks on resource-constrained devices. By leveraging Bayesian epistemic uncertainty for routing, our method enhances performance while ensuring stability and energy efficiency, making it an ideal candidate for real-world applications.
