Speech enhancement using PCA and variance of the reconstruction error model identification (original) (raw)

A Review of Signal Subspace Speech Enhancement and Its Application to Noise Robust Speech Recognition

The objective of this paper is threefold: (1) to provide an extensive review of signal subspace speech enhancement, (2) to derive an upper bound for the performance of these techniques, and (3) to present a comprehensive study of the potential of subspace filtering to increase the robustness of automatic speech recognisers against stationary additive noise distortions. Subspace filtering methods are based on the orthogonal decomposition of the noisy speech observation space into a signal subspace and a noise subspace. This decomposition is possible under the assumption of a low-rank model for speech, and on the availability of an estimate of the noise correlation matrix. We present an extensive overview of the available estimators, and derive a theoretical estimator to experimentally assess an upper bound to the performance that can be achieved by any subspace-based method. Automatic speech recognition (ASR) experiments with noisy data demonstrate that subspace-based speech enhancement can significantly increase the robustness of these systems in additive coloured noise environments. Optimal performance is obtained only if no explicit rank reduction of the noisy Hankel matrix is performed. Although this strategy might increase the level of the residual noise, it reduces the risk of removing essential signal information for the recogniser's back end. Finally, it is also shown that subspace filtering compares favourably to the well-known spectral subtraction technique.

Single channel speech enhancement using principal component analysis and MDL subspace selection

1999

We present in this paper a novel subspace approach for single channel speech enhancement and speech recognition in highly noisy environments. Our algorithm is based on principal component analysis and the optimal subspace selection is provided by a minimum description length criterion. This choice overcomes the limitations encountered with other selection criteria, like the overestimation of the signal plus noise subspace or the need for empirical parameters. We h a v e also extended our subspace algorithm to take into account the case of colored noise. The performance evaluation shows that our method provides a higher noise reduction and a lower signal distortion than existing enhancement methods and that speech recognition in noise is improved. Our algorithm succeeds in extracting the relevant features of speech even in highly noisy conditions without introducing artefacts such a s m usical noise".

Perceptual subspace speech enhancement using variance of the reconstruction error

Digital Signal Processing, 2014

In this paper, a new signal subspace-based approach for enhancing a speech signal degraded by environmental noise is presented. The Perceptual Karhunen-Loève Transform (PKLT) method is improved here by including the Variance of the Reconstruction Error (VRE) criterion, in order to optimize the subspace decomposition model. The incorporation of the VRE in the PKLT (namely the PKLT-VRE hybrid method) yields a good tradeoff between the noise reduction and the speech distortion thanks to the combination of a perceptual criterion and the optimal determination of the noisy subspace dimension. In adverse conditions, the experimental tests, using objective quality measures, show that the proposed method provides a higher noise reduction and a lower signal distortion than the existing speech enhancement techniques.

Speech Enhancement using Signal Subspace Algorithm

In speech communication, quality and intelligibility of speech is of utmost importance for ease and accuracy of information exchange. The speech processing systems used to communicate or store speech are usually designed for a noise free environment but in a real-world environment, the presence of background interference in the form of additive background noise and channel noise drastically degrades the performance of these systems, causing inaccurate information exchange and listener fatigue. Speech enhancement algorithms attempt to improve the performance of communication systems when their input or output signals are corrupted by noise. Speech Enhancement in general has three major objectives: (a) To improve the perceptual aspects such as quality and intelligibility of the processed speech i.e. to make it sound better or clearer to the human listener; (b) to improve the robustness of the speech coders which tend to be severely affected by presence of noise; and (c) to increase the accuracy of speech recognition systems operating in less than ideal locations.

Subspace based Speech Enhancement using Common Vector Approach

2016

In this paper, we propose a new speech enhancement method using the common vector approach. Common vector approach is a subspace method used in recognition applications. In the proposed method, we separate the noisy speech data into magnitude and phase in frequency domain. And also magnitude data is separated into common and difference parts using common vector. It is considered that difference part contains the noise. Therefore, this part is cleaned using Linear Minimum Mean Square Error Estimation. After this process, the magnitude data is reconstructed by combining common part. The frequency domain speech data is rebuilt by sum of the reconstructed magnitude data and keeped phase data and transform to time domain on each frame. The proposed method was evaluated under various noise conditions. The results are compared with several methods in well-known quality measures.

Signal Subspace Speech Enhancement for Audible Noise Reduction

Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005., 2000

A novel subspace-based speech enhancement scheme based on a criterion of audible noise reduction is considered. Masking properties of the human auditory system is used to define the audible noise quantity in the eigen-domain. Subsequently, an audible noise reduction scheme is developed based on a signal subspace technique. We derive the eigendecomposition of the estimated speech autocorrelation matrix with the assumption of white noise and outline the implementation of our proposed scheme. We further extend the scheme to the colored noise case. Simulation results show that our proposed scheme outperforms many existing subspace methods in terms of segmental signal-to-noise ratio (SNR), perceptual evaluation of speech quality (PESQ) and informal listening tests.

An energy-constrained signal subspace method for speech enhancement and recognition in white and colored noises

Speech Communication, 1998

In this paper, an energy-constrained signal subspace (ECSS) method is proposed for speech enhancement and automatic speech recognition under additive noise condition. The key idea is to match the short-time energy of the enhanced speech signal to the unbiased estimate of the short-time energy of the clean speech, which is proven very eective for improving the estimation of the noise-like, low-energy segments in continuous speech. The ECSS method is applied to both white and colored noises where the additive colored noise is modelled by an autoregressive (AR) process. A modi®ed covariance method is used to estimate the AR parameters of the colored noise and a prewhitening ®lter is constructed based on the estimated parameters. The performances of the proposed algorithms were evaluated using the TI46 digit database and the TIMIT continuous speech database. It was found that the ECSS method can achieve very high word recognition accuracy (WRA) for the digits set under low SNR conditions. For continuous speech data set, this method helped to improve the SNR by 2±6 dB and the WRA by 13.7±45.5% for the white noise and 18.6±55.9% for the colored noise under various SNR conditions. Ó

Energy-constrained signal subspace method for speech enhancement and recognition

IEEE Signal Processing Letters, 1997

In this paper, an energy-constrained signal subspace (ECSS) method is proposed for speech enhancement and automatic speech recognition under additive noise condition. The key idea is to match the short-time energy of the enhanced speech signal to the unbiased estimate of the short-time energy of the clean speech, which is proven very eective for improving the estimation of the noise-like, low-energy segments in continuous speech. The ECSS method is applied to both white and colored noises where the additive colored noise is modelled by an autoregressive (AR) process. A modi®ed covariance method is used to estimate the AR parameters of the colored noise and a prewhitening ®lter is constructed based on the estimated parameters. The performances of the proposed algorithms were evaluated using the TI46 digit database and the TIMIT continuous speech database. It was found that the ECSS method can achieve very high word recognition accuracy (WRA) for the digits set under low SNR conditions. For continuous speech data set, this method helped to improve the SNR by 2±6 dB and the WRA by 13.7±45.5% for the white noise and 18.6±55.9% for the colored noise under various SNR conditions. Ó

Extension of the local subspace method to enhancement of speech with colored noise

2008

Based on dynamic features of human speech, the local projection (LP) method has been adapted to the enhancement of speech corrupted by white noise. As an extension of the LP method, a strategy with two rounds of projection is introduced to enhance the speech contaminated with colored noise. Colored noise mainly resides in a low dimensional subspace, and is assumed to be stationary in this communication. At step one, a noise dominated subspace is first estimated with colored noise obtained from speech silence frame. Then for the reference phase point, the components, projected into the noise dominated subspace, are deleted and the enhanced speech is reconstructed with the remaining components. The residual error of the output of step one tends to distribute uniformly on each direction. So at step two, the LP method is further applied to the output of step one, treating the residual error as white noise. An adaption of this strategy to continuous speech is performed. The results show that this strategy is more effective than the LP method in enhancing speech corrupted by colored noise, and is comparable to two typical speech enhancement methods. r

Robust recognition of noisy speech using speech enhancement

WCC 2000 - ICSP 2000. 2000 5th International Conference on Signal Processing Proceedings. 16th World Computer Congress 2000, 2000

In this paper, a robust speech recognition system in the presence of additive background noise is studied. We adopt speech enhancement as the front-end processing module to improve the Signal-to-Noise Ratio (SNR) of the input noisy speech for recognition in the latter stage. Three different speech enhancement algorithms were tested on six types of additive noises. Experiment results show that the cascading of the speech enhancer and a Hidden Markov Model (HMM) based speech recognizer can significantly improve recognition accuracy in noisy environments.