A Unified Framework for Designing Optimal STSA Estimators Assuming Maximum Likelihood Phase Equivalence of Speech and Noise (original) (raw)

A Brief Survey of Speech Enhancement 1

We present a brief overview of the speech enhancement problem for wide-band noise sources that are not correlated with the speech signal. Our main focus is on the spectral subtraction approach and some of its derivatives in the forms of linear and non-linear minimum mean square error estimators. For the linear case, we review the signal subspace approach, and for the non-linear case, we review spectral magnitude and phase estimators. On line estimation of the second order statistics of speech signals using parametric and non-parametric models is also addressed.

Generalized maximum a posteriori spectral amplitude estimation for speech enhancement

Speech Communication, 2015

Spectral restoration methods for speech enhancement aim to remove noise components in noisy speech signals by using a gain function in the spectral domain. How to design the gain function is one of the most important parts for obtaining enhanced speech with good quality. In most studies, the gain function is designed by optimizing a criterion based on some assumptions of the noise and speech distributions, such as minimum mean square error (MMSE), maximum likelihood (ML), and maximum a posteriori (MAP) criteria. The MAP criterion shows advantage in obtaining a more reliable gain function by incorporating a suitable prior density. However, it has a problem as several studies showed: although MAP based estimator effectively reduces noise components when the signal-to-noise ratio (SNR) is low, it brings large speech distortion when the SNR is high. For solving this problem, we have proposed a generalized maximum a posteriori spectral amplitude (GMAPA) algorithm in designing a gain function for speech enhancement. The proposed GMAPA algorithm dynamically specifies the weight of prior density of speech spectra according to the SNR of the testing speech signals to calculate the optimal gain function. When the SNR is high, GMAPA adopts a small weight to prevent overcompensations that may result in speech distortions. On the other hand, when the SNR is low, GMAPA uses a large weight to avoid disturbance of the restoration caused by measurement noises. In our previous study, it has been proven that the weight of the prior density plays a crucial role to the GMAPA performance, and the weight is determined based on the SNR in an utterance-level. In this paper, we propose to compute the weight with the consideration of time-frequency correlations that result in a more accurate estimation of the gain function. Experiments were carried out to evaluate the proposed algorithm on both objective tests and subjective tests. The experimental results obtained from objective tests indicate that GMAPA is promising compared to several well-known algorithms at both high and low SNRs. The results of subjective listening tests indicate that GMAPA provides significantly higher sound quality than other speech enhancement algorithms.

Log-spectral amplitude estimation with Generalized Gamma distributions for speech enhancement

2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2011

This paper presents a family of log-spectral amplitude (LSA) estimators for speech enhancement. Generalized Gamma distributed (GGD) priors are assumed for speech short-time spectral amplitudes (STSAs), providing mathematical flexibility in capturing the statistical behavior of speech. Although solutions are not obtainable in closed-form, estimators are expressed as limits, and can be efficiently approximated. When applied to the Noizeus database [1], proposed estimators are shown to provide improvements in segmental signal-to-noise ratio (SSNR) and COSH distance [2], relative to the LSA estimator proposed by Ephraim and Malah .

Overcoming the Statistical Independence Assumption w.r.t. Frequency in Speech Enhancement

2005

In this paper we give a solution on how to overcome the assumption of statistical independence of adjacent frequency bins in noise reduction techniques. We show that under relaxed assumptions the problem results in an a-priori SNR estimation problem, where all available noisy speech spectral amplitudes (observations) are exploited. Any state-of-the-art noise power spectral density (psd) estimation and weighting rule can be used -they do not need to be restated. In order to solve for an estimator well suited for real-time applications, we model the a-priori SNR values as Markov processes w.r.t. frequency. On the basis of the formulation by Ephraim and Malah, this leads to a new a-priori SNR estimator that yields fewer musical tones.

Speech enhancement using fourth-order cumulants and time-domain optimal filters

1999

A new method for speech enhancement using time-domain optimum ®lters and fourth-order cumulants (FOC) is proposed based on newly established properties of the FOC of speech signals. In the exploratory part of the paper, the analytical expression of the FOC of subbanded speech is derived assuming a sinusoidal model and up to two harmonics per band. Important properties about this cumulant are revealed and actual speech data is used to verify the derivations and the underlying model. In the application part of the work, speech enhancement is formulated as an estimation problem and the expression for the time-domain causal optimum ®lters is derived for a pth order system. The key idea is to use the FOC of the noisy speech to estimate the parameters required for the enhancement ®lters, namely the secondorder statistics of the speech and noise. It is shown that the kurtosis and the diagonal slice of the FOC may be used to estimate such parameters as the SNR, the speech autocorrelation and the probability of speech presence in a given band. Subjective listening and examination of the spectrograms show that the resulting algorithm is eective on typical noises encountered in mobile telephony. Compared to the TIA-IS127 standard for noise reduction, it results in overall more noise reduction and better speech preservation in Gaussian, street and fan noise. Its eectiveness diminishes however in harmonic and impulsive types such as oce and car engine, where discrimination between speech and noise based on FOC becomes more dicult. Ó : S 0 1 6 7 -6 3 9 3 ( 0 0 ) 0 0 0 8 1 -9

Reducing over- and under-estimation of the a priori SNR in speech enhancement techniques

Digital Signal Processing, 2014

Most speech enhancement methods based on short-time spectral modification are generally expressed as a spectral gain depending on the estimate of the local signal-to-noise ratio (SNR) on each frequency bin. Several studies have analyzed the performance of a priori SNR estimation algorithms to improve speech quality and to reduce speech distortions. In this paper, we concentrate on the analysis of overand under estimation of the a priori SNR in speech enhancement and noise reduction systems. We first show that conventional approaches such as the decision-directed approach proposed by Ephraïm and Malah lead to a biased estimator for the a priori SNR. To reduce this bias, our strategy relies on the introduction of a correction term in the a priori SNR estimate depending on the current state of both the available a posteriori SNR and the estimated a priori one. The proposed solution leads to a biascompensated a priori SNR estimate, and allows to finely estimating the output speech signal to be very close to the original one on each frequency bin. Such refinement procedure in the a priori SNR estimate can be inserted in any type of spectral gain function to improve the output speech quality. Objective tests under various environments in terms of the Normalized Covariance Metric (NCM) criterion, the Coherence Speech Intelligibility Index (CSII) criterion, the segmental SNR criterion and the Perceptual Evaluation of Speech Quality (PESQ) measure are presented showing the superiority of the proposed method compared to competitive algorithms. (M. Djendi), pascal.scalart@enssat.fr (P. Scalart).

Speech enhancement based on Rayleigh mixture modeling of speech spectral amplitude distributions

DFT-based speech enhancement algorithms typically rely on a statistical model of the spectral amplitudes of the noise-free speech signal. It has been shown in the literature recently that the speech spectral amplitude distributions, conditional on estimated a priori SNR, may differ significantly from the traditional Gaussian model and are better described by super-Gaussian probability density functions. We show that these conditional distributions can be accurately approximated by a mixture of Rayleigh distributions. The MMSE amplitude estimators based on Rayleigh Mixture Models perform at least as well as the estimators based on super-Gaussian models. Furthermore, the proposed Rayleigh Mixture Models allow for derivation of closed-form estimators minimizing other perceptually relevant distortion measures, which may be difficult for other models.

Single Channel Speech Enhancement Using Spectral Subtraction Based on Minimum Statistics

2011

Speech is an elementary source of human interaction. The quality and intelligibility of speech signals during communication are generally degraded by the surrounding noise. Corrupted speech signals need therefore to be enhanced to improve quality and intelligibility. In the field of speech processing, much effort has been devoted to develop speech enhancement techniques in order to restore the speech signal by reducing the amount of disturbing noise. This thesis focuses on a single channel speech enhancement technique that performs noise reduction by spectral subtraction based on minimum statistics. Minimum statistics means that the power spectrum of the non-stationary noise signal is estimated by finding the minimum values of a smoothed power spectrum of the noisy speech signal and, thus, circumvents the speech activity detection problem. The performance of the spectral subtraction method is evaluated using single channel speech data and for a wide range of noise types with various...

On DCT-based MMSE estimation of short time spectral amplitude for single-channel speech enhancement

Elsevier, 2023

This paper proposes Discrete Cosine Transform (DCT) based speech enhancement algorithms. These algorithms utilize minimum mean square error (MMSE) estimator of clean short-time spectral amplitude, which respectively uses Gaussian, Laplace and Gamma probability density functions (PDF) as speech priors. We consider the noise process is additive and Gaussian. The proposed estimators are closed-form solutions, whereas the conventional Discrete Fourier Transform (DFT) based estimators derived under super-Gaussian speech priors have no closed-form solutions. We also examine the estimators with the Speech Presence Uncertainty (SPU) that addresses the speech or silence problem with probability. Compared to the alternative approaches, such as the Ephraim and Malah or the Erkelens et.al MMSE-STSA estimators, the proposed methods demonstrate superior performance in terms of Segmental SNR (SegSNR), Perceptual Evaluation of Speech Quality (PESQ), short-time objective intelligibility measure (STOI), and mean subjective preference score, while exhibiting an equal or lower complexity.

Minimum Mean-Square Error Amplitude Estimators for Speech Enhancement Under the Generalized Gamma Distribution

In this paper we derive minimum mean-square error (MMSE) amplitude estimators for DFT-based noise suppression. The optimal estimators are found under a generalized Gamma distribution, which takes as special cases (different parameter settings) all priors used in noise suppression schemes so far. Deriving the MMSE estimators involves integration of (weighted) Bessel functions. In order to end up with analytical solutions, for some parameter settings we have to approximate the Bessel functions. In this paper we combine two types of approximations by using a simple binary decision between the two. We show by computer simulations that the estimators thus obtained are very close to the exact MMSE estimators for all SNR conditions. The presented estimators lead to improved performance compared to the suppression rule proposed by Ephraim and Malah. Furthermore, the maximum performance is the same as compared to state of the art amplitude estimators.

A Unified Framework for Designing Optimal STSA Estimators Assuming Maximum Likelihood Phase Equivalence of Speech and Noise (original) (raw)

Related papers