Causal Convolutional Encoder Decoder-Based Augmented Kalman Filter for Speech Enhancement (original) (raw)

Causal Convolutional Neural Network-Based Kalman Filter for Speech Enhancement

2020 IEEE Asia-Pacific Conference on Computer Science and Data Engineering (CSDE), 2020

Speech enhancement using Kalman filter (KF) suffers from inaccurate estimates of the noise variance and the linear prediction coefficients (LPCs) in real-life noise conditions. This causes a degraded speech enhancement performance. In this paper, a causal convolutional neural network (CCNN) model is used to more accurately estimate the noise variance and LPC parameters of the KF for speech enhancement in real-life noise conditions. Specifically, a CCNN model gives an instantaneous estimate of the noise waveform for each noisy speech frame to compute the noise variance. Each noisy speech frame is pre-whitened by a whitening filter, which is constructed with the coefficients computed from the estimated noise. The LPC parameters are computed from the pre-whitened speech. The improved noise variance and LPCs enables the KF to minimize residual noise as well as distortion in the enhanced speech. Objective and subjective testing on NOIZEUS corpus reveal that the enhanced speech produced b...

Deep Residual Network-Based Augmented Kalman Filter for Speech Enhancement

2020 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), 2020

Speech enhancement using augmented Kalman filter (AKF) suffers from the inaccurate estimates of the key parameters, linear prediction coefficients (LPCs) of speech and noise signal in noisy conditions. The existing AKF particularly enhances speech in colored noise conditions. In this paper, a deep residual network (ResNet)-based method utilizes the LPC estimates of the AKF for speech enhancement in various noise conditions. Specifically, a ResNet20 (constructed with 20 layers) gives an estimate of the noise waveform for each noisy speech frame to compute the noise LPC parameters. Each noisy speech frame is pre-whitened by a whitening filter, which is constructed with the corresponding noise LPCs. The speech LPC parameters are computed from the pre-whitened speech. The improved speech and noise LPC parameters enable the AKF to minimize residual noise as well as distortion in the enhanced speech. Objective and subjective testing on NOIZEUS corpus reveal that the proposed method exhibi...

Deep Learning with Augmented Kalman Filter for Single-Channel Speech Enhancement

2020 IEEE International Symposium on Circuits and Systems (ISCAS), 2020

The existing augmented Kalman filter (AKF) suffers from poor LPC estimates in real-world noise conditions, which degrades the speech enhancement performance. In this paper, a deep learning technique exploits the LPC estimates for the AKF to enhance speech in various noise conditions. Specifically, a deep residual network is used to estimate the noise PSD for computing noise LPCs. A whitening filter is also implemented with the noise LPCs to pre-whiten the noisy speech signal prior to estimating the speech LPCs. It is shown that the improved speech and noise LPCs enable the AKF to minimize the residual noise as well as distortion in the enhanced speech. Experimental results show that the enhanced speech produced by the proposed method exhibits higher quality and intelligibility than the benchmark methods in various noise conditions for a wide-range of SNR levels.

A Deep Learning-Based Kalman Filter for Speech Enhancement

Interspeech 2020, 2020

The existing Kalman filter (KF) suffers from poor estimates of the noise variance and the linear prediction coefficients (LPCs) in real-world noise conditions. This results in a degraded speech enhancement performance. In this paper, a deep learning approach is used to more accurately estimate the noise variance and LPCs, enabling the KF to enhance speech in various noise conditions. Specifically, a deep learning approach to MMSEbased noise power spectral density (PSD) estimation, called DeepMMSE, is used. The estimated noise PSD is used to compute the noise variance. We also construct a whitening filter with its coefficients computed from the estimated noise PSD. It is then applied to the noisy speech, yielding pre-whitened speech for computing the LPCs. The improved noise variance and LPC estimates enable the KF to minimise the residual noise and distortion in the enhanced speech. Experimental results show that the proposed method exhibits higher quality and intelligibility in the enhanced speech than the benchmark methods in various noise conditions for a wide-range of SNR levels.

DeepLPC: A Deep Learning Approach to Augmented Kalman Filter-Based Single-Channel Speech Enhancement

2021

Current deep learning approaches to linear prediction coefficient (LPC) estimation for the augmented Kalman filter (AKF) produce bias estimates, due to the use of a whitening filter. This severely degrades the perceived quality and intelligibility of enhanced speech produced by the AKF. In this paper, we propose a deep learning framework that produces clean speech and noise LPC estimates with significantly less bias than previous methods, by avoiding the use of a whitening filter. The proposed framework, called DeepLPC, jointly estimates the clean speech and noise LPC power spectra. The estimated clean speech and noise LPC power spectra are passed through the inverse Fourier transform to form autocorrelation matrices, which are then solved by the Levinson-Durbin recursion to form the LPCs and prediction error variances of the speech and noise for the AKF. The performance of DeepLPC is evaluated on the NOIZEUS and DEMAND Voice Bank datasets using subjective AB listening tests, as wel...

Perceptual Kalman filtering for speech enhancement in colored noise

2004 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2004

A new method for speech enhancement in colored noise is proposed in this paper. A Kalman filter concatenated with a post-filter based on masking properties of human auditory systems is proposed for the problem. A recursive approach to compute the noise covariance matrix is used for estimating the colored noise statistics. In the post-filter, both time domain masking properties and frequency domain masking properties are taken into account. From the calculated masking level, the noisy speech spectrum is adjusted accordingly. Simulation results show that the proposed approach has the best performance compared with other recent methods, evaluated with PESQ scores.

Kalman Filtering with Machine Learning Methods for Speech Enhancement

2021

Speech corrupted by background noise (or noisy speech) can reduce the efficiency of communication between man-man and man-machine. A speech enhancement algorithm (SEA) can be used to suppress the embedded background noise and increase the quality and intelligibility of noisy speech. Many applications, such as speech communication systems, hearing aid devices, and speech recognition systems, typically rely upon speech enhancement algorithms for robustness. This dissertation focuses on single-channel speech enhancement using Kalman filtering with machine learning methods. In Kalman filter (KF)-based speech enhancement, each clean speech frame is represented by an auto-regressive (AR) process, whose parameters comprise the linear prediction coefficients (LPCs) and prediction error variance. The LPC parameters and the additive noise variance are used to form the recursive equations of the KF. In augmented KF (AKF), both the clean speech and additive noise LPC parameters are incorporated...

Robustness and Sensitivity Tuning of the Kalman Filter for Speech Enhancement

Signals, 2021

Inaccurate estimates of the linear prediction coefficient (LPC) and noise variance introduce bias in Kalman filter (KF) gain and degrade speech enhancement performance. The existing methods propose a tuning of the biased Kalman gain, particularly in stationary noise conditions. This paper introduces a tuning of the KF gain for speech enhancement in real-life noise conditions. First, we estimate noise from each noisy speech frame using a speech presence probability (SPP) method to compute the noise variance. Then, we construct a whitening filter (with its coefficients computed from the estimated noise) to pre-whiten each noisy speech frame prior to computing the speech LPC parameters. We then construct the KF with the estimated parameters, where the robustness metric offsets the bias in KF gain during speech absence of noisy speech to that of the sensitivity metric during speech presence to achieve better noise reduction. The noise variance and the speech model parameters are adopted...

Speech Enhancement Using Neural Network

2011

This paper describes a neural network speech enhancement system using Multilayer Perceptron (MLP) network and trained using the back propagation algorithm (BPA). Speech enhancement is generally refers to map noisy speech into cleaner speech. Noisy speech signals are obtained by adding random noise to the clean signals .Speech enhancement is then performed on the noisy signals by using the ADALINE. Here we show that neural nets can be used to significantly boost recognition accuracy, without retraining the speech recognizer.

Performance analysis of neural network, NMF and statistical approaches for speech enhancement

International Journal of Speech Technology, 2020

Bayesian Estimators are very useful in speech enhancement and noise reduction. But, it is noted that the traditional estimators process only amplitudes and the phase is left unprocessed. Among the Bayesian estimators, Super-Gaussian based estimators provide improved noise reduction. Super-Gaussian Bayesian estimators, which uses processed phase information for estimation of amplitudes provides further improved results. In this work, the Complex speech coefficients given Uncertain Phase (CUP) based Bayesian estimators like CUP-GG (CUP Estimator with speech spectral coefficients assumed as Gamma and noise spectral coefficients as Generalized Gamma), CUP-NG (Speech as Nakagami) are compared under white noise, pink noise, Babble noise and Non-Stationary factory noise conditions. The statistical estimators show less effective results under completely non-stationary assumptions like non-stationary factory noise, babble noise etc. Non-negative Matrix Factorization (NMF) based algorithms show better performance for non stationary noises. The drawback of NMF is, it requires apriori knowledge about speech. This drawback can be overcome by taking the advantages of both statistical approaches and NMF approaches. NR-NMF and WR-NMF speech enhancement methods are developed by providing posteriori regularization based on statistical assumption of speech and noise DFT coefficients distribution. Also a speech enhancement method which uses CUP-GG estimator and NMF with online noise bases update are considered for comparison. The progress in neural network based approaches for speech enhancement further shown that with large dataset and better training, the speech enhancement algorithms results in improved results. In this work, the neural network approach for speech enhancement is implemented and compared the method with traditional estimators and NMF approaches. For generalization of unseen noise types the proposed neural network approach uses dropout. Also for training the network, the features obtained from apriori SNR and aposteriori SNR is used in this method. The objective of this paper is to analyze the performance of speech enhancement methods based on Neural Network, NMF and statistical based. The objective performance measures Perceptual Evaluation of Speech Quality (PESQ), Short-Time Objective Intelligibility (STOI), Signal to Noise Ratio (SNR), Segmental SNR (Seg SNR) are considered for comparison.