Curiosity-Driven Quantized Mixture-of-Experts for Stable AI

Uncertainty Makes It Stable: Curiosity-Driven Quantized Mixture-of-Experts

Summary: arXiv:2511.11743v3 Announce Type: replace-cross

Abstract

Deploying deep neural networks on resource-constrained devices faces two critical challenges: maintaining accuracy under aggressive quantization while ensuring predictable inference latency. We present a curiosity-driven quantized Mixture-of-Experts framework that addresses both through Bayesian epistemic uncertainty-based routing across heterogeneous experts (BitNet ternary, 1-16 bit BitLinear, post-training quantization).

Key Findings

Our framework has been evaluated on various audio classification benchmarks, including:

ESC-50
Quinn
UrbanSound8K

Notably, our 4-bit quantization maintains 99.9 percent of full-precision F1 score (0.858 vs 0.859) while achieving 4x compression and a remarkable 31 percent energy savings compared to 8-bit systems. Both the 4-bit and 8-bit configurations achieve statistical parity with full precision (p > 0.05).

Curiosity-Driven Routing

Crucially, our curiosity-driven routing mechanism simultaneously enhances both accuracy and stability. For instance, on the Quinn dataset, the F1 score improves from 0.802 to 0.809, while cross-fold variance drops by an impressive 85 percent (p < 0.001, Levene's test). This trend is consistent across datasets, with variance reductions ranging from 50 to 94 percent.

The routing mechanism is self-organizing, allowing the high-precision 8-bit expert to automatically receive the most uncertain samples, which have a 20 percent lower confidence (p < 0.001). Conversely, lightweight experts are tasked with handling easier inputs. This targeted approach ensures that datasets with already low baseline variance do not experience artificial stability gains, confirming that the mechanism is focused on addressing genuine epistemic uncertainty rather than overfitting routing decisions.

Interpretable and Precision-Aware Framework

With a compact architecture of 1.2 million parameters, our framework delivers interpretable and precision-aware routing that is particularly suitable for safety-sensitive edge deployments. In these contexts, both accuracy and predictability are paramount.

Conclusion

In summary, the curiosity-driven quantized Mixture-of-Experts framework showcases a promising approach to overcoming the challenges associated with deploying deep neural networks on resource-constrained devices. By leveraging Bayesian epistemic uncertainty for routing, our method enhances performance while ensuring stability and energy efficiency, making it an ideal candidate for real-world applications.

RichlyAI Blog AI Guide, Tutorials, Industrial Insights, & more!

Company

Curiosity-Driven Quantized Mixture-of-Experts for Stable AI

Uncertainty Makes It Stable: Curiosity-Driven Quantized Mixture-of-Experts

Abstract

Key Findings

Curiosity-Driven Routing

Interpretable and Precision-Aware Framework

Conclusion

Related AI Insights

Subscribe

More like thisRelated

About us

Company

The latest

Subscribe

RichlyAI Blog
AI Guide, Tutorials, Industrial Insights, & more!

More like this
Related