Sound demos for "CleanUNet 2: A Hybrid Speech Denoising Model in Time and Time-Frequency Domain"

Authors: Zhifeng Kong, Wei Ping, Ambrish Dantrey, Bryan Catanzaro

We present audio samples for the causal CleanUNet 2 model proposed in CleanUNet 2: A Hybrid Speech Denoising Model in Time and time-frequency Domain. We compare CleanUNet 2 to the state-of-the-art models including the FAIR-denoiser, FullSubNet, and CleanUNet. Spectrogram-based methods like FullSubNet have noise leakage under high noise levels, which is caused by inaccurate phase extracted from noisy speech. Waveform-based methods have smaller noise leakage even under high noise levels because these methods directly denoise the waveform. However, prior methods such as FAIR-denoiser produce less natural sound. The proposed CleanUNet-2 is a hybrid denoiser on both time and time-frequency domains, and have small noise leakage while retaining more natural sound.

Speech Denoising on the DNS (2020) Dataset



Noise type: dog barking

Noisy CleanUNet 2 (ours) CleanUNet FAIR-denoiser FullSubNet Clean (reference)


Noise type: baby

Noisy CleanUNet 2 (ours) CleanUNet FAIR-denoiser FullSubNet Clean (reference)


Noise type: wind noise

Noisy CleanUNet 2 (ours) CleanUNet FAIR-denoiser FullSubNet Clean (reference)


Noise type: shrill noise

Noisy CleanUNet 2 (ours) CleanUNet FAIR-denoiser FullSubNet Clean (reference)


Noise type: bird

Noisy CleanUNet 2 (ours) CleanUNet FAIR-denoiser FullSubNet Clean (reference)


Noise type: air conditioner

Noisy CleanUNet 2 (ours) CleanUNet FAIR-denoiser FullSubNet Clean (reference)


Noise type: bus

Noisy CleanUNet 2 (ours) CleanUNet FAIR-denoiser FullSubNet Clean (reference)