DDSP-QbE++ Enhances Speech Quality for Anonymisation

Date:

DDSP-QbE++: Improving Speech Quality for Speech Anonymisation for Atypical Speech

In recent years, the field of voice conversion has seen significant advancements, particularly with the introduction of Differentiable Digital Signal Processing (DDSP) pipelines. A new study, identified as arXiv:2604.09246v1, has proposed significant enhancements to the existing DDSP-QbE framework, aiming to improve the quality of speech synthesis in atypical speech scenarios.

The traditional DDSP-QbE framework utilizes a method known as subtractive synthesis. In this method, a periodic excitation signal is shaped by a learned spectral envelope to reconstruct the desired target voice. However, the existing DDSP-QbE system has been observed to produce undesirable artefacts due to its excitation generation process, which relies on phase accumulation to create a sawtooth-like waveform. The inherent abrupt discontinuities in this waveform lead to aliasing artefacts, which are perceived as buzziness and spectral distortion, particularly at higher fundamental frequencies.

Proposed Improvements

The researchers behind this study have introduced two innovative modifications to enhance the excitation stage of the DDSP-QbE subtractive synthesizer:

  • Explicit Voicing Detection:

    The first improvement involves the incorporation of explicit voicing detection. This technique allows for the gating of harmonic excitation, which effectively suppresses the periodic component in unvoiced regions of speech. Instead of generating a periodic signal in these areas, filtered noise is introduced. This substitution helps to avoid the aliased harmonic content that can be particularly disruptive to the overall quality of the synthesized speech.

  • Polynomial Band-Limited Step (PolyBLEP) Correction:

    The second enhancement involves the application of a PolyBLEP correction to the phase-accumulated oscillator. This method replaces the hard waveform discontinuities at each phase wrap with a smooth polynomial residual. By doing so, the approach effectively cancels out the alias-generating components without requiring oversampling or spectral truncation. The result is a cleaner harmonic roll-off and a significant reduction in high-frequency artefacts.

Results and Impact

Combining these two modifications results in a substantial improvement in the perceptual naturalness of the generated speech, as measured by Mean Opinion Score (MOS) evaluations. The enhancements contribute to a cleaner sound quality with reduced high-frequency artefacts, making the synthesized speech more pleasant to the ear.

Notably, the proposed DDSP-QbE++ approach is designed to be lightweight and differentiable, allowing it to integrate seamlessly into the existing DDSP-QbE training pipeline without the need for additional learnable parameters. This aspect not only simplifies implementation but also enhances the efficiency of the training process.

In conclusion, the DDSP-QbE++ framework represents a significant step forward in the field of voice conversion, particularly for applications requiring speech anonymisation in atypical speech contexts. By addressing the core issues of the original DDSP-QbE system, these improvements have the potential to advance the quality and usability of synthesized speech in various real-world applications.


Related AI Insights

Lazarus Omolua
Lazarus Omoluahttps://richlyai.com/blog
My mission is to make sure that people in Africa are not left behind in the global AI revolution. RichlyAI exists to give everyone — students, founders, creators, and businesses — the tools to compete globally.

Subscribe

Popular

More like this
Related

How Business Ops Teams Boost Productivity with Codex

Discover how business operations teams use Codex to streamline documentation, enhance collaboration, and improve decision-making with AI-powered automation...

OpenAI Partners with Malta to Offer ChatGPT Plus Nationwide

OpenAI and Malta team up to provide free ChatGPT Plus access and AI training to all citizens, promoting digital literacy and responsible AI use.

Critical Linux Kernel Flaw Risks SSH Host Key Theft

A critical Linux kernel flaw risks stolen SSH host keys. Learn how to protect your systems and stay secure until patches are widely available.

Top External Hard Drives 2026: Expert Reviews & Buying Guide

Discover the best external hard drives of 2026 with expert reviews. Find top picks for speed, durability, and security to suit all storage needs.