Joint maximum likelihood estimation of late reverberant and speech power spectral density in noisy environments (original) (raw)
Related papers
2016
Various dereverberation and noise reduction algorithms require power spectral density estimates of the anechoic speech, reverberation, and noise. In this work, we derive a novel multichannel estimator for the power spectral densities (PSDs) of the reverberation and the speech suitable also for noisy environments. The speech and reverberation PSDs are estimated from all the entries of the received signals power spectral density (PSD) matrix. The Frobenius norm of a general error matrix is minimized to find the best fitting PSDs. Experimental results show that the proposed estimator provides accurate estimates of the PSDs, and is outperforming competing estimators. Moreover, when used in a multi-microphone noise reduction and dereverberation algorithm, the estimated reverberation and speech PSDs are shown to provide improved performance measures as compared with the competing estimators.
Speech dereverberation using backward estimation of the late reverberant spectral variance
2008
In speech communication systems the received microphone signals are degraded by room reverberation and ambient noise. This signal degradation can decrease the fidelity and intelligibility of the desired speaker. Reverberant speech can be separated into two components, viz. an early speech component and a late reverberant speech component. Reverberation suppression algorithms, that are feasible in practice, have been developed to suppress late reverberant speech or in other words to estimate the early speech component. The main challenge is to develop an estimator for the so-called late reverberant spectral variance (LRSV). In this contribution a generalized statistical reverberation model is proposed that can be used to estimate the LRSV. Novel and existing estimators can be derived from this model. One novel estimator is a so-called backward estimator that uses an estimate of the early speech component to obtain an estimate of the LRSV. Advantages and possible disadvantages of the estimators are discussed, and experimental results using simulated reverberant speech are presented.
Dual-Microphone Speech Dereverberation in a Noisy Environment
2006 IEEE International Symposium on Signal Processing and Information Technology, 2006
Speech signals recorded with a distant microphone usually contain reverberation and noise, which degrade the fidelity and intelligibility of speech, and the recognition performance of automatic speech recognition systems. In [1] Habets presented a multi-microphone speech dereverberation algorithm to suppress late reverberation in a noise-free environment. In this paper we show how an estimate of the late reverberant energy can be obtained from noisy observations. A more sophisticated speech enhancement technique based on the Optimally-Modified Log Spectral Amplitude (OM-LSA) estimator is used to suppress the undesired late reverberant signal and noise. The speech presence probability used in the OM-LSA is extended to improve the decision between speech, late reverberation and noise. Experiments using simulated and real acoustic impulse responses are presented and show significant reverberation reduction with little speech distortion.
Multi-channel PSD estimators for speech dereverberation - A theoretical and experimental comparison
2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2015
In this paper we perform an extensive theoretical and experimental comparison of two recently proposed multi-channel speech dereverberation algorithms. Both of them are based on the multi-channel Wiener filter but they use different estimators of the speech and reverberation power spectral densities (PSDs). We first derive closedform expressions for the mean square error (MSE) of both PSD estimators and then show that one estimator-previously used for speech dereverberation by the authors-always yields a better MSE. Only in the case of a two microphone array or for special spatial distributions of the interference both estimators yield the same MSE. The theoretically derived MSE values are in good agreement with numerical simulation results and with instrumental speech quality measures in a realistic speech dereverberation task for binaural hearing aids.
Cramér–Rao Bound Analysis of Reverberation Level Estimators for Dereverberation and Noise Reduction
IEEE/ACM Transactions on Audio, Speech, and Language Processing
The reverberation power spectral density (PSD) is often required for dereverberation and noise reduction algorithms. In this work, we compare two maximum likelihood (ML) estimators of the reverberation PSD in a noisy environment. In the first estimator, the direct path is first blocked. Then, the ML criterion for estimating the reverberation PSD is stated according to the probability density function (p.d.f.) of the blocking matrix (BM) outputs. In the second estimator, the speech component is not blocked. Since the anechoic speech PSD is usually unknown in advance, it is estimated as well. To compare the expected mean square error (MSE) between the two ML estimators of the reverberation PSD, the Cramér-Rao Bounds (CRBs) for the two ML estimators are derived. We show that the CRB for the joint reverberation and speech PSD estimator is lower than the CRB for estimating the reverberation PSD from the BM outputs. Experimental results show that the MSE of the two estimators indeed obeys the CRB curves. Experimental results of multi-microphone dereverberation and noise reduction algorithm show the benefits of using the ML estimators in comparison with another baseline estimators.
Multi-Microphone Speech Dereverberation and Noise Reduction Using Relative Early Transfer Functions
IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2015
In speech communication systems, the microphone signals are degraded by reverberation and ambient noise. The reverberant speech can be separated into two components, namely, an early speech component that includes the direct path and some early reflections, and a late reverberant component that includes all the late reflections. In this paper, a novel algorithm to simultaneously suppress early reflections, late reverberation and ambient noise is presented. A multi-microphone minimum mean square error estimator is used to obtain a spatially filtered version of the early speech component. The estimator constructed as a minimum variance distortionless response (MVDR) beamformer (BF) followed by a postfilter (PF). Three unique design features characterize the proposed method. First, the MVDR BF is implemented in a special structure, named the nonorthogonal generalized sidelobe canceller (NO-GSC). Compared with the more conventional orthogonal GSC structure, the new structure allows for a simpler implementation of the GSC blocks for various MVDR constraints. Second, In contrast to earlier works, RETFs are used in the MVDR criterion rather than either the entire RTFs or only the direct-path of the desired speech signal. An estimator of the RETFs is proposed as well. Third, the late reverberation and noise are processed by both the beamforming stage and the PF stage. Since the relative power of the noise and the late reverberation varies with the frame index, a computationally efficient method for the required matrix inversion is proposed to circumvent the cumbersome mathematical operation. The algorithm was evaluated and compared with two alternative multichannel algorithms and one single-channel algorithm using simulated data and data recorded in a room with a reverberation time of 0.5 s for various source-microphone array distances (1-4 m) and several signal-to-noise levels. The processed signals were tested using two commonly used objective measures, namely perceptual evaluation of speech quality and log-spectral distance. As an additional objective measure, the improvement in word accuracy percentage of an acoustic speech recognition system is also demonstrated. Index Terms-Dereverberation, relative transfer function, multichannel Wiener filter, minimum variance distortionless response (MVDR) beamforming, generalized sidelobe canceller. I. INTRODUCTION D EREVERBERATION aims at the reduction of reverberation that is caused by a multitude of reflections from walls and other objects and has become a major research subject in
Maximum likelihood PSD estimation for speech enhancement in reverberant and noisy conditions
2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2016
We propose a novel Power Spectral Density (PSD) estimator for multi-microphone systems operating in reverberant and noisy conditions. The estimator is derived using the maximum likelihood approach and is based on a blocked and pre-whitened additive signal model. The intended application of the estimator is in speech enhancement algorithms, such as the Multi-channel Wiener Filter (MWF) and the Minimum Variance Distortionless Response (MVDR) beamformer. We evaluate these two algorithms in a speech dereverberation task and compare the performance obtained using the proposed and a competing PSD estimator. Instrumental performance measures indicate an advantage of the proposed estimator over the competing one. In a speech intelligibility test all algorithms significantly improved the word intelligibility score. While the results suggest a minor advantage of using the proposed PSD estimator, the difference between algorithms was found to be statistically significant only in some of the experimental conditions.
A MULTICHANNEL MAXIMUM LIKELIHOOD APPROACH TO DEREVERBERATION
Reverberation can severely degrade the intelligibility of speech. Blind de-reverberation aims at restoring the original signal by attenuating the reverberation without prior knowledge of the surrounding acoustic environment nor of the source. In this paper, single-channel and multi-channel de-reverberation structures are compared and the advantages of the multi-channel approach are discussed. We propose an adaptive multi-channel blind de-reverberation algorithm based on a maximum likelihood approach that exploits results relating to the multiple input/output inverse theorem (MINT). The performance of the algorithm is illustrated using an eight-channel linear microphone array placed in a real room. Simulation results show that the algorithm can achieve very good de-reverberation when the channels are time aligned.
In reverberant environments, reverberant components of speech degrade the performance of automatic speech recognition (ASR). There are many dereverberation methods for improving the performance. Some researchers have proposed dereverberation methods with a low computational load based on a statistical model of reverberation . Lebart et al. proposed a dereverberation method [3] using Polack's statistical model , whose parameter is reverberation time (RT). This method is effective and its computational load is relatively low; however, its performance is unstable because it estimates RT only from the end of an utterance. Gomez et al. proposed an effective method of the dereverberation of late reverberation, but this method requires an impulse response in a room to have been measured in advance . Löllmann et al. also used a statistical model whose RT is estimated by a maximum likelihood approach . This method needs more parameters and computational load than Lebart's method. The key to using statistical models for dereverberation is to limit the number of parameters and to estimate them robustly.
Single- and multi-microphone speech dereverberation using spectral enhancement
2007
Reverberation is the process of multi-path propagation of an acoustic signal from its source to the microphone. The received signal generally consists of a direct sound, reflections that arrive shortly after the direct sound (commonly called early reverberation), and reflections that arrive after the early reverberation (commonly called late reverberation). The combination of the direct sound and early reverberation is sometimes referred to as the early sound component. The different sound components will now be discussed in more detail. Therefore, we will first investigate which physical properties of an enclosed space determine the speech intelligibly and quality (Section 1.3.1). Reverberation is well understood in terms of physical acoustics, but it is not yet well understood how the afore mentioned distortions of the speech affect the intelligibility [6]. In Section 1.3.2 From the above discussion it can be concluded that late reverberation and noise are the main causes of the degradation in speech intelligibility. Furthermore, the perceptual speech quality, which is related to the subjective preference and sound impression, is related to two physical properties of reverberation, i.e., colouration and reverberation time. It should be noted that these properties are not independent since the amount colouration depends on the reverberation time, room volume, and source-microphone distance. The reverberation time RT 60 is not only important from a perceptual point of view but it also characterizes the 'shape' of the AIR, as discussed in Section 1.2. Therefore, the reverberation time RT 60 is an important measure that plays a crucial role in our work. These insights will be vital to the work presented in this dissertation.