Deep Residual Network-Based Augmented Kalman Filter for Speech Enhancement (original) (raw)

Deep Learning with Augmented Kalman Filter for Single-Channel Speech Enhancement

2020 IEEE International Symposium on Circuits and Systems (ISCAS), 2020

The existing augmented Kalman filter (AKF) suffers from poor LPC estimates in real-world noise conditions, which degrades the speech enhancement performance. In this paper, a deep learning technique exploits the LPC estimates for the AKF to enhance speech in various noise conditions. Specifically, a deep residual network is used to estimate the noise PSD for computing noise LPCs. A whitening filter is also implemented with the noise LPCs to pre-whiten the noisy speech signal prior to estimating the speech LPCs. It is shown that the improved speech and noise LPCs enable the AKF to minimize the residual noise as well as distortion in the enhanced speech. Experimental results show that the enhanced speech produced by the proposed method exhibits higher quality and intelligibility than the benchmark methods in various noise conditions for a wide-range of SNR levels.

Causal Convolutional Neural Network-Based Kalman Filter for Speech Enhancement

2020 IEEE Asia-Pacific Conference on Computer Science and Data Engineering (CSDE), 2020

Speech enhancement using Kalman filter (KF) suffers from inaccurate estimates of the noise variance and the linear prediction coefficients (LPCs) in real-life noise conditions. This causes a degraded speech enhancement performance. In this paper, a causal convolutional neural network (CCNN) model is used to more accurately estimate the noise variance and LPC parameters of the KF for speech enhancement in real-life noise conditions. Specifically, a CCNN model gives an instantaneous estimate of the noise waveform for each noisy speech frame to compute the noise variance. Each noisy speech frame is pre-whitened by a whitening filter, which is constructed with the coefficients computed from the estimated noise. The LPC parameters are computed from the pre-whitened speech. The improved noise variance and LPCs enables the KF to minimize residual noise as well as distortion in the enhanced speech. Objective and subjective testing on NOIZEUS corpus reveal that the enhanced speech produced b...

A Deep Learning-Based Kalman Filter for Speech Enhancement

Interspeech 2020, 2020

The existing Kalman filter (KF) suffers from poor estimates of the noise variance and the linear prediction coefficients (LPCs) in real-world noise conditions. This results in a degraded speech enhancement performance. In this paper, a deep learning approach is used to more accurately estimate the noise variance and LPCs, enabling the KF to enhance speech in various noise conditions. Specifically, a deep learning approach to MMSEbased noise power spectral density (PSD) estimation, called DeepMMSE, is used. The estimated noise PSD is used to compute the noise variance. We also construct a whitening filter with its coefficients computed from the estimated noise PSD. It is then applied to the noisy speech, yielding pre-whitened speech for computing the LPCs. The improved noise variance and LPC estimates enable the KF to minimise the residual noise and distortion in the enhanced speech. Experimental results show that the proposed method exhibits higher quality and intelligibility in the enhanced speech than the benchmark methods in various noise conditions for a wide-range of SNR levels.

Causal Convolutional Encoder Decoder-Based Augmented Kalman Filter for Speech Enhancement

2020 14th International Conference on Signal Processing and Communication Systems (ICSPCS), 2020

Speech enhancement using augmented Kalman filter (AKF) suffers from the biased estimates of the linear prediction coefficients (LPCs) of speech and noise signal in noisy conditions. The existing AKF was particularly designed to enhance the colored noise corrupted speech. In this paper, a causal convolutional encoder-decoder (CCED)-based method utilizes the LPC estimates of the AKF for speech enhancement. Specifically, a CCED network is used to estimate the instantaneous noise spectrum for computing the LPCs of noise on a framewise basis. Each noise corrupted speech frame is pre-whitened by a whitening filter, which is constructed with the noise LPCs. The speech LPCs are computed from the pre-whitened speech. The improved speech and noise LPCs enables the AKF to minimize the residual noise as well as distortion in the enhanced speech. Objective and subjective testing on NOIZEUS corpus reveal that the enhanced speech produced by the proposed method exhibits higher quality and intelligibility than the benchmark methods in various noise conditions for a wide range of SNR levels. Index Terms-Speech enhancement, augmented Kalman filter, convolution neural network, LPC, whitening filter.

DeepLPC: A Deep Learning Approach to Augmented Kalman Filter-Based Single-Channel Speech Enhancement

2021

Current deep learning approaches to linear prediction coefficient (LPC) estimation for the augmented Kalman filter (AKF) produce bias estimates, due to the use of a whitening filter. This severely degrades the perceived quality and intelligibility of enhanced speech produced by the AKF. In this paper, we propose a deep learning framework that produces clean speech and noise LPC estimates with significantly less bias than previous methods, by avoiding the use of a whitening filter. The proposed framework, called DeepLPC, jointly estimates the clean speech and noise LPC power spectra. The estimated clean speech and noise LPC power spectra are passed through the inverse Fourier transform to form autocorrelation matrices, which are then solved by the Levinson-Durbin recursion to form the LPCs and prediction error variances of the speech and noise for the AKF. The performance of DeepLPC is evaluated on the NOIZEUS and DEMAND Voice Bank datasets using subjective AB listening tests, as wel...

Speech Enhancement Using Deep Neural Network

2016

Speech is the main source of human interaction. In everyday life,Speech understanding in noisy environments is still one of the major challenges for users. The quality and intelligibility of speech signals are generally gets corrupted by the surrounding background noise during communication. So to improve the quality and intelligibility, Corrupted speech signals is to be enhanced. In the field of speech processing, different effort has been taken to develop speech enhancement techniques in order to enhance the speech signal by reducing the amount of noise. Speech enhancement deals with improving the quality and intelligibility of speech which gets degraded in the presence of surrounding background noise. In various everyday environments, the goal of speech enhancement methods is to improving the quality and intelligibility of speech especially at low Signal-to-Noise ratios (SNR). Regarding intelligibility, different machine learning methods that aim to estimate an ideal binary mask ...

Robust DNN-Based Speech Enhancement with Limited Training Data

2018

In conventional speech enhancement, statistical models for speech and noise are used to derive clean speech estimators. The parameters of the models are estimated blindly from the noisy observation using carefully designed algorithms. These algorithms generalize well to unseen acoustic conditions, but are unable to reduce highly non-stationary noise types. This shortcoming motivated the usage of machine-learning-based (ML-based) algorithms, in particular deep neural networks (DNNs). But if only limited training data are available, the noise reduction performance in unseen acoustic conditions suffers. In this paper, motivated by conventional speech enhancement, we propose to use the a priori and a posteriori signal-to-noise ratios (SNRs) for DNN-based speech enhancement systems. Instrumental measures show that the proposed features increase the robustness in unknown noise types even if only limited training data are available.

Speech Enhancement Using Deep Learning Methods: A Review

Jurnal Elektronika dan Telekomunikasi

Speech enhancement, which aims to recover the clean speech of the corrupted signal, plays an important role in the digital speech signal processing. According to the type of degradation and noise in the speech signal, approaches to speech enhancement vary. Thus, the research topic remains challenging in practice, specifically when dealing with highly non-stationary noise and reverberation. Recent advance of deep learning technologies has provided great support for the progress in speech enhancement research field. Deep learning has been known to outperform the statistical model used in the conventional speech enhancement. Hence, it deserves a dedicated survey. In this review, we described the advantages and disadvantages of recent deep learning approaches. We also discussed challenges and trends of this field. From the reviewed works, we concluded that the trend of the deep learning architecture has shifted from the standard deep neural network (DNN) to convolutional neural network ...

On Training Targets for Supervised LPC Estimation to Augmented Kalman Filter-based Speech Enhancement

2021

The performance of speech coding, speech recognition, and speech enhancement largely depends upon the accuracy of the linear prediction coefficient (LPC) of clean speech and noise in practice. Formulation of speech and noise LPC estimation as a supervised learning problem has shown considerable promise. In its simplest form, a supervised technique, typically a deep neural network (DNN) is trained to learn a mapping from noisy speech features to clean speech and noise LPCs. Training targets for DNN to clean speech and noise LPC estimation fall into four categories: line spectrum frequency (LSF), LPC power spectrum (LPC-PS), power spectrum (PS), and magnitude spectrum (MS). The choice of appropriate training target as well as the DNN method can have a significant impact on LPC estimation in practice. Motivated by this, we perform a comprehensive study on the training targets using two state-of-the-art DNN methods--- residual network and temporal convolutional network (ResNet-TCN) and ...