On linear versus non-linear magnitude-DFT estimators and the influence of super-Gaussian speech priors (original) (raw)
Related papers
Speech enhancement using an Improved MMSE estimator with Laplacian prior
2010 5th International Symposium on Telecommunications, 2010
In this paper we present an optimal estimator of magnitude spectrum for speech enhancement when the clean speech DFT coefficients are modeled by a Laplacian distribution and the noise DFT coefficients are modeled by a Gaussian distribution. Chen has already introduced a Minimum Mean Square Error (MMSE) estimator of the magnitude spectrum. However, the proposed estimator, namely LapMMSE, does not have a closed form and is computationally extensive. We use his formulation for the MMSE estimator, employ some approximations and propose a computationally effective estimator for the magnitude spectrum.
Performance Analysis of Statistical Approaches and NMF Approaches for Speech Enhancement
International Journal of Image, Graphics and Signal Processing, 2019
Super-Gaussian Based Bayesian Estimators plays significant role in noise reduction. However, the traditional Bayesian Estimators process only DFT spectral amplitude of noisy speech and the phase is left unprocessed. While deriving Bayesian estimators, consideration of phase information provides improved results. The main objective of this paper is twofold. Firstly, the Super-Gaussian based Complex speech coefficients given Uncertain Phase (CUP) based Bayesian estimators are compared under different noise conditions like White noise, Babble noise, Pink noise, Modulated Pink noise, Factory noise, Car noise, Street noise, F16 noise and M109 noise. Secondly, a novel speech enhancement method is proposed by combining CUP estimators with different NMF approaches and online bases updation. The statistical estimators show less effective results under completely non-stationary assumptions. Non-negative Matrix Factorization (NMF) based algorithms show better performance for non stationary noises. The drawback of NMF is, it requires training and/or requires clean speech and noise signals. This drawback can be overcome by taking the advantages of both statistical approaches and NMF approaches.
Minimum Mean Square Error estimation of speech spectral amplitude using super-Gaussian priors
6th International Symposium on Telecommunications (IST), 2012
This paper deals with the spectral estimation methods for speech enhancement. We firstly show the proper matching of the super-Gaussian distribution with the histogram of the speech spectral amplitude. For the selected speech material, the best matching is achieved when the super-Gaussian parameters are set to 0, 2.5 ν μ = = . We then derive Minimum Mean Square Error (MMSE) estimator for speech DFT amplitude when clean speech spectral amplitudes are modeled by super-Gaussian probability distribution and noise DFT coefficients are presented as Gaussian random variables. Evaluation results, in terms of different objective quality measures, show that the MMSE estimator based on super-Gaussian distribution (with parameters 0, 2.5 ν μ = = ) leads to superior results in speech enhancement.
On the Importance of Super-Gaussian Speech Priors for Machine-Learning Based Speech Enhancement
IEEE/ACM Transactions on Audio, Speech, and Language Processing
For enhancing noisy signals, machine-learning based single-channel speech enhancement schemes exploit prior knowledge about typical speech spectral structures. To ensure a good generalization and to meet requirements in terms of computational complexity and memory consumption, certain methods restrict themselves to learning speech spectral envelopes. We refer to these approaches as machine-learning spectral envelope (MLSE)-based approaches. In this paper we show by means of theoretical and experimental analyses that for MLSE-based approaches, super-Gaussian priors allow for a reduction of noise between speech spectral harmonics which is not achievable using Gaussian estimators such as the Wiener filter. For the evaluation, we use a deep neural network (DNN)-based phoneme classifier and a low-rank nonnegative matrix factorization (NMF) framework as examples of MLSE-based approaches. A listening experiment and instrumental measures confirm that while super-Gaussian priors yield only moderate improvements for classic enhancement schemes, for MLSE-based approaches super-Gaussian priors clearly make an important difference and significantly outperform Gaussian priors.
In this paper we derive minimum mean-square error (MMSE) amplitude estimators for DFT-based noise suppression. The optimal estimators are found under a generalized Gamma distribution, which takes as special cases (different parameter settings) all priors used in noise suppression schemes so far. Deriving the MMSE estimators involves integration of (weighted) Bessel functions. In order to end up with analytical solutions, for some parameter settings we have to approximate the Bessel functions. In this paper we combine two types of approximations by using a simple binary decision between the two. We show by computer simulations that the estimators thus obtained are very close to the exact MMSE estimators for all SNR conditions. The presented estimators lead to improved performance compared to the suppression rule proposed by Ephraim and Malah. Furthermore, the maximum performance is the same as compared to state of the art amplitude estimators.
Speech enhancement with an adaptive Wiener filter
International Journal of Speech Technology, 2013
This paper proposes an adaptive Wiener filtering method for speech enhancement. This method depends on the adaptation of the filter transfer function from sample to sample based on the speech signal statistics; the local mean and the local variance. It is implemented in the time
A Brief Survey of Speech Enhancement 1
We present a brief overview of the speech enhancement problem for wide-band noise sources that are not correlated with the speech signal. Our main focus is on the spectral subtraction approach and some of its derivatives in the forms of linear and non-linear minimum mean square error estimators. For the linear case, we review the signal subspace approach, and for the non-linear case, we review spectral magnitude and phase estimators. On line estimation of the second order statistics of speech signals using parametric and non-parametric models is also addressed.
On the Importance of Super-Gaussian Speech Priors for Pre-Trained Speech Enhancement
arXiv (Cornell University), 2017
For enhancing noisy signals, machine-learning based single-channel speech enhancement schemes exploit prior knowledge about typical speech spectral structures. To ensure a good generalization and to meet requirements in terms of computational complexity and memory consumption, certain methods restrict themselves to learning speech spectral envelopes. We refer to these approaches as machine-learning spectral envelope (MLSE)-based approaches. In this paper we show by means of theoretical and experimental analyses that for MLSE-based approaches, super-Gaussian priors allow for a reduction of noise between speech spectral harmonics which is not achievable using Gaussian estimators such as the Wiener filter. For the evaluation, we use a deep neural network (DNN)-based phoneme classifier and a low-rank nonnegative matrix factorization (NMF) framework as examples of MLSE-based approaches. A listening experiment and instrumental measures confirm that while super-Gaussian priors yield only moderate improvements for classic enhancement schemes, for MLSE-based approaches super-Gaussian priors clearly make an important difference and significantly outperform Gaussian priors.
Speech Enhancement Under a Combined Stochastic-Deterministic Model
2006 IEEE International Conference on Acoustics Speed and Signal Processing Proceedings, 2006
Most DFT domain based enhancement methods rely on stochastic models to derive clean speech estimators. In this paper we investigate the use of a deterministic speech model and present an MMSE estimator under a combined stochastic-deterministic speech model. Experimental results show an increase in segmental SNR of 1.18 dB, compared to the use of a stochastic model alone. Furthermore, PESQ evaluations lead to an increase of 0.3 on the MOS scale. Listening tests show a preference for the proposed MMSE estimator under combined stochastic-deterministic speech model.
Speech Communication, 2018
Traditional speech enhancement algorithms are based on amplitude only processing, in which the amplitudes of speech are processed and phase is left unprocessed. Recently, Short Time Fourier Transform (STFT) based single channel speech enhancement algorithms are developed by considering prior knowledge of phase and its uncertainty. The uncertain knowledge of the phase is obtained from the phase reconstruction algorithms. The goal of this paper is twofold. One is deriving Joint Minimum Mean Square Error (MMSE) estimate of Complex speech coefficients given Uncertainty Phase (CUP) by assuming the speech coefficients as Nagakami, Gamma and noise distribution as Generalized Gamma distribution (GGD). Also estimators of type, Amplitudes given Uncertainty Phase (AUP), which uses uncertain phase only for amplitude estimation and not for phase improvement are derived. Also Novel Phase-blind estimators are developed using Nagakami PDF / Gamma as speech priors and Generalized Gamma as Noise Prior. Finally comparison of all estimators using uncertain prior phase information is discussed and how initial phase information affects the enhancement process is analyzed with novel estimators. The proposed CUP estimator outperforms the existing algorithms in terms of objective performance measures Segmental Signal to Noise Ratio (SSNR), Phase Signal to Noise Ratio (PSNR), Perceptual Evaluation of Speech Quality (PESQ), and Short Time Objective Intelligibility (STOI). Secondly, a combination of statistical based approach and Non-negative Matrix Factorization (NMF) based speech enhancement technique, in which bases are update on-line is discussed. The proposed estimator gain is used with NMF and analyzes its performance using PESQ measure.