Enhancing Efficiency and Performance in Deepfake Audio Detection through Neuron-level Dropin & Neuroplasticity Mechanisms
Summary: arXiv:2603.24343v2 Announce Type: cross
Abstract
Current audio deepfake detection has achieved remarkable performance using diverse deep learning architectures such as ResNet, and has seen further improvements with the introduction of large models (LMs) like Wav2Vec. The success of large language models (LLMs) further demonstrates the benefits of scaling model parameters, but also highlights one bottleneck where performance gains are constrained by parameter counts. Simply stacking additional layers, as done in current LLMs, is computationally expensive and requires full retraining.
Introduction
Deepfake technology has progressed significantly, leading to an increase in the demand for effective audio deepfake detection methods. Traditional approaches often rely on large-scale models to improve accuracy, but they face limitations related to computational costs and retraining challenges. This article discusses innovative solutions inspired by biological mechanisms that aim to enhance detection efficiency.
Neuronal Inspiration
Inspired by the neuronal plasticity observed in mammalian brains, we propose novel algorithms: dropin and further plasticity. These algorithms dynamically adjust the number of neurons in specific layers, allowing for flexible modulation of model parameters. This approach seeks to overcome the constraints faced by existing LLMs, particularly regarding performance scalability and computational efficiency.
Methodology
We evaluated the proposed algorithms on multiple architectures, including:
- ResNet
- Gated Recurrent Neural Networks (GRNNs)
- Wav2Vec
These models were tested using widely recognized datasets such as ASVSpoof2019 LA, PA, and FakeorReal. Our focus was to measure the effectiveness of the dropin approach and neuroplasticity in reducing the Equal Error Rate (EER).
Results
The experimental results demonstrated consistent improvements in computational efficiency with the application of the dropin algorithm. We observed a maximum reduction in EER of around:
- 39% with the dropin approach
- 66% with the combined dropin and plasticity approach
These results underline the potential of our proposed methods in enhancing deepfake audio detection capabilities.
Conclusion
Our study presents a significant advancement in deepfake audio detection through the introduction of neuron-level dropin and neuroplasticity mechanisms. By allowing for adaptive changes in model architecture, these methods not only improve performance but also reduce computational costs associated with deep learning models. The findings highlight a promising direction for future research in both audio detection and the broader field of artificial intelligence.
The code and supplementary material are available at the following GitHub link.
