A Deep Learning-Based Kalman Filter for Speech Enhancement (original) (raw)

Deep Learning with Augmented Kalman Filter for Single-Channel Speech Enhancement

2020 IEEE International Symposium on Circuits and Systems (ISCAS), 2020

The existing augmented Kalman filter (AKF) suffers from poor LPC estimates in real-world noise conditions, which degrades the speech enhancement performance. In this paper, a deep learning technique exploits the LPC estimates for the AKF to enhance speech in various noise conditions. Specifically, a deep residual network is used to estimate the noise PSD for computing noise LPCs. A whitening filter is also implemented with the noise LPCs to pre-whiten the noisy speech signal prior to estimating the speech LPCs. It is shown that the improved speech and noise LPCs enable the AKF to minimize the residual noise as well as distortion in the enhanced speech. Experimental results show that the enhanced speech produced by the proposed method exhibits higher quality and intelligibility than the benchmark methods in various noise conditions for a wide-range of SNR levels.

DeepLPC: A Deep Learning Approach to Augmented Kalman Filter-Based Single-Channel Speech Enhancement

2021

Current deep learning approaches to linear prediction coefficient (LPC) estimation for the augmented Kalman filter (AKF) produce bias estimates, due to the use of a whitening filter. This severely degrades the perceived quality and intelligibility of enhanced speech produced by the AKF. In this paper, we propose a deep learning framework that produces clean speech and noise LPC estimates with significantly less bias than previous methods, by avoiding the use of a whitening filter. The proposed framework, called DeepLPC, jointly estimates the clean speech and noise LPC power spectra. The estimated clean speech and noise LPC power spectra are passed through the inverse Fourier transform to form autocorrelation matrices, which are then solved by the Levinson-Durbin recursion to form the LPCs and prediction error variances of the speech and noise for the AKF. The performance of DeepLPC is evaluated on the NOIZEUS and DEMAND Voice Bank datasets using subjective AB listening tests, as wel...

Speech Enhancement Using Deep Learning Methods: A Review

Jurnal Elektronika dan Telekomunikasi

Speech enhancement, which aims to recover the clean speech of the corrupted signal, plays an important role in the digital speech signal processing. According to the type of degradation and noise in the speech signal, approaches to speech enhancement vary. Thus, the research topic remains challenging in practice, specifically when dealing with highly non-stationary noise and reverberation. Recent advance of deep learning technologies has provided great support for the progress in speech enhancement research field. Deep learning has been known to outperform the statistical model used in the conventional speech enhancement. Hence, it deserves a dedicated survey. In this review, we described the advantages and disadvantages of recent deep learning approaches. We also discussed challenges and trends of this field. From the reviewed works, we concluded that the trend of the deep learning architecture has shifted from the standard deep neural network (DNN) to convolutional neural network ...

Speech Enhancement Using Deep Neural Network

2016

Speech is the main source of human interaction. In everyday life,Speech understanding in noisy environments is still one of the major challenges for users. The quality and intelligibility of speech signals are generally gets corrupted by the surrounding background noise during communication. So to improve the quality and intelligibility, Corrupted speech signals is to be enhanced. In the field of speech processing, different effort has been taken to develop speech enhancement techniques in order to enhance the speech signal by reducing the amount of noise. Speech enhancement deals with improving the quality and intelligibility of speech which gets degraded in the presence of surrounding background noise. In various everyday environments, the goal of speech enhancement methods is to improving the quality and intelligibility of speech especially at low Signal-to-Noise ratios (SNR). Regarding intelligibility, different machine learning methods that aim to estimate an ideal binary mask ...

Deep Residual Network-Based Augmented Kalman Filter for Speech Enhancement

2020 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), 2020

Speech enhancement using augmented Kalman filter (AKF) suffers from the inaccurate estimates of the key parameters, linear prediction coefficients (LPCs) of speech and noise signal in noisy conditions. The existing AKF particularly enhances speech in colored noise conditions. In this paper, a deep residual network (ResNet)-based method utilizes the LPC estimates of the AKF for speech enhancement in various noise conditions. Specifically, a ResNet20 (constructed with 20 layers) gives an estimate of the noise waveform for each noisy speech frame to compute the noise LPC parameters. Each noisy speech frame is pre-whitened by a whitening filter, which is constructed with the corresponding noise LPCs. The speech LPC parameters are computed from the pre-whitened speech. The improved speech and noise LPC parameters enable the AKF to minimize residual noise as well as distortion in the enhanced speech. Objective and subjective testing on NOIZEUS corpus reveal that the proposed method exhibi...

Causal Convolutional Neural Network-Based Kalman Filter for Speech Enhancement

2020 IEEE Asia-Pacific Conference on Computer Science and Data Engineering (CSDE), 2020

Speech enhancement using Kalman filter (KF) suffers from inaccurate estimates of the noise variance and the linear prediction coefficients (LPCs) in real-life noise conditions. This causes a degraded speech enhancement performance. In this paper, a causal convolutional neural network (CCNN) model is used to more accurately estimate the noise variance and LPC parameters of the KF for speech enhancement in real-life noise conditions. Specifically, a CCNN model gives an instantaneous estimate of the noise waveform for each noisy speech frame to compute the noise variance. Each noisy speech frame is pre-whitened by a whitening filter, which is constructed with the coefficients computed from the estimated noise. The LPC parameters are computed from the pre-whitened speech. The improved noise variance and LPCs enables the KF to minimize residual noise as well as distortion in the enhanced speech. Objective and subjective testing on NOIZEUS corpus reveal that the enhanced speech produced b...

An Experimental Analysis of Deep Learning Architectures for Supervised Speech Enhancement

Electronics

Recent speech enhancement research has shown that deep learning techniques are very effective in removing background noise. Many deep neural networks are being proposed, showing promising results for improving overall speech perception. The Deep Multilayer Perceptron, Convolutional Neural Networks, and the Denoising Autoencoder are well-established architectures for speech enhancement; however, choosing between different deep learning models has been mainly empirical. Consequently, a comparative analysis is needed between these three architecture types in order to show the factors affecting their performance. In this paper, this analysis is presented by comparing seven deep learning models that belong to these three categories. The comparison includes evaluating the performance in terms of the overall quality of the output speech using five objective evaluation metrics and a subjective evaluation with 23 listeners; the ability to deal with challenging noise conditions; generalizatio...

Applications of deep learning to speech enhancement

2019

Deep neural networks (DNNs) have been successfully employed in a broad range of applications, having achieved state-of-the-art results in tasks such as acoustic modeling for automatic speech recognition and image classification. However, their application to speech enhancement problems such as denoising, dereverberation, and source separation is more recent work and therefore in more preliminary stages. In this work, we explore DNN-based speech enhancement from three different and complementary points of view. First, we propose a model to perform speech dereverberation by estimating its spectral magnitude from the reverberant counterpart. Our models are capable of extracting features that take into account both short and long-term dependencies in the signal through a convolutional encoder and a recurrent neural network for extracting long-term information. Our model outperforms a recently proposed model that uses different context information depending on the reverberation time, wit...

Speech Intelligibility Based Enhancement System Using Modified Deep Neural Network and Adaptive Multi-band Spectral Subtraction

Wireless Personal Communications, 2019

In contrast to the adverse environments, performances of existing speech enhancement algorithms do not always produce satisfactory results. In the case of worst signal to noise ratio, the processing is complicated and it may introduce signal distortions and degradation of intelligibility. To overcome the complexity of the existing speech enhancement algorithms, a hybrid concept for enhancing the speech quality and intelligibility is proposed in this research. The primary objectives of the research work is to increase the intelligibility of the speech enhancement system that has been trained for a particular speech signal using modified deep neural network (DNN) and adaptive multi-band spectral subtraction (AdMBSS). In this work, AdMBSS is used for enhancing the intelligibility of the speech signal using the additional phase information calculation, and finally, hybrid DNN and Nelder Mead optimization is utilized to improve the signal quality. Experimental results explain that the proposed framework achieves improved performance in signal to noise ratio, perceptual evaluation of signal quality and minimum mean square error. Finally, performances are taken for the more noises like bus noise, train noise, babble noise, airport noise, station noise and exhibition noise.

A Deep Learning-Based Kalman Filter for Speech Enhancement (original) (raw)

Related papers