Extension of the local subspace method to enhancement of speech with colored noise (original) (raw)
Related papers
Processing time improvement for speech enhancement based local projection using dynamic parameters
The Journal of the Acoustical Society of America, 2013
Local projection (LP) has been widely used to enhance speech by transforming noisy speech into two orthogonal subspaces: noise (S1) and signal plus small amount of noise (S2) subspace. S1 is removed and S2 is transformed into time domain resulting in the enhanced speech. Satisfactory results with significantly improved speech quality have been reported by several works although the processing time was not taken into account. Four parameters to be considered are embedding dimension (d), time delay (τ), numbers of iteration, and minimal embedding dimension. Speech quality is increased by the increase of d-parameter, resulting in decrease of τ-parameter value (d×τ kept constant) and the dramatic increase of the processing time. The goal is to come up with the best d×τ parameter for each iteration, while speech quality remains almost unaffected. Rather than using a fixed d×τ parameter, a dynamic approach is taken. Specifically, τ is initially set to 1 and incremented by 1 for next itera...
Improved embedded pre-whitening subspace approach for enhancing speech contaminated by colored noise
Speech Communication, 2018
An improved embedded pre-whitening subspace approach based on spectral domain constraints is proposed for enhancement of speech contaminated by colored noise. The main contribution of this work is to propose a particular non-unitary spectral transformation of the residual noise. This non-unitary transform is based on simultaneous diagonalization of the clean speech and noise covariance matrices. With this particular transformation, the optimization problem results from subspace approach based on spectral domain constraints can be solved without any restrictions on the form of contributed matrices. Under some theoretical assumptions the proposed method is reduced to the Hu and Loizou method, which is similar to the proposed method except for the gain matrix. Comparing with the Hu and Loizou method, speech enhancement measures for our approach show significant improvement in enhancing the TIMIT sentences corrupted by brown, pink and multi-talker babble noises. Also, comparisons with two non embedded pre-whitening subspace methods, PKLT and PKLT-VRE, show that the proposed method is comparable with these two methods.
Perceptual subspace speech enhancement using variance of the reconstruction error
Digital Signal Processing, 2014
In this paper, a new signal subspace-based approach for enhancing a speech signal degraded by environmental noise is presented. The Perceptual Karhunen-Loève Transform (PKLT) method is improved here by including the Variance of the Reconstruction Error (VRE) criterion, in order to optimize the subspace decomposition model. The incorporation of the VRE in the PKLT (namely the PKLT-VRE hybrid method) yields a good tradeoff between the noise reduction and the speech distortion thanks to the combination of a perceptual criterion and the optimal determination of the noisy subspace dimension. In adverse conditions, the experimental tests, using objective quality measures, show that the proposed method provides a higher noise reduction and a lower signal distortion than the existing speech enhancement techniques.
Speech Enhancement using Signal Subspace Algorithm
In speech communication, quality and intelligibility of speech is of utmost importance for ease and accuracy of information exchange. The speech processing systems used to communicate or store speech are usually designed for a noise free environment but in a real-world environment, the presence of background interference in the form of additive background noise and channel noise drastically degrades the performance of these systems, causing inaccurate information exchange and listener fatigue. Speech enhancement algorithms attempt to improve the performance of communication systems when their input or output signals are corrupted by noise. Speech Enhancement in general has three major objectives: (a) To improve the perceptual aspects such as quality and intelligibility of the processed speech i.e. to make it sound better or clearer to the human listener; (b) to improve the robustness of the speech coders which tend to be severely affected by presence of noise; and (c) to increase the accuracy of speech recognition systems operating in less than ideal locations.
Single channel speech enhancement using principal component analysis and MDL subspace selection
1999
We present in this paper a novel subspace approach for single channel speech enhancement and speech recognition in highly noisy environments. Our algorithm is based on principal component analysis and the optimal subspace selection is provided by a minimum description length criterion. This choice overcomes the limitations encountered with other selection criteria, like the overestimation of the signal plus noise subspace or the need for empirical parameters. We h a v e also extended our subspace algorithm to take into account the case of colored noise. The performance evaluation shows that our method provides a higher noise reduction and a lower signal distortion than existing enhancement methods and that speech recognition in noise is improved. Our algorithm succeeds in extracting the relevant features of speech even in highly noisy conditions without introducing artefacts such a s m usical noise".
Speech enhancement using pca and variance of the reconstruction error model identification
… Annual Conference of …, 2007
We present in this paper a signal subspace-based approach for enhancing a noisy signal. This algorithm is based on a principal component analysis (PCA) in which the optimal subspace selection is provided by a variance of the reconstruction error (VRE) criterion. This choice overcomes many limitations encountered with other selection criteria, like overestimation of the signal subspace or the need for empirical parameters. We have also extended our subspace algorithm to take into account the case of colored and babble noise. The performance evaluation, which is made on the Aurora database, measures improvements in the distributed speech recognition of noisy signals corrupted by different types of additive noises. Our algorithm succeeds in improving the recognition of noisy speech in all noisy conditions.
A Brief Survey of Speech Enhancement 1
We present a brief overview of the speech enhancement problem for wide-band noise sources that are not correlated with the speech signal. Our main focus is on the spectral subtraction approach and some of its derivatives in the forms of linear and non-linear minimum mean square error estimators. For the linear case, we review the signal subspace approach, and for the non-linear case, we review spectral magnitude and phase estimators. On line estimation of the second order statistics of speech signals using parametric and non-parametric models is also addressed.
Signal Subspace Speech Enhancement for Audible Noise Reduction
Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005., 2000
A novel subspace-based speech enhancement scheme based on a criterion of audible noise reduction is considered. Masking properties of the human auditory system is used to define the audible noise quantity in the eigen-domain. Subsequently, an audible noise reduction scheme is developed based on a signal subspace technique. We derive the eigendecomposition of the estimated speech autocorrelation matrix with the assumption of white noise and outline the implementation of our proposed scheme. We further extend the scheme to the colored noise case. Simulation results show that our proposed scheme outperforms many existing subspace methods in terms of segmental signal-to-noise ratio (SNR), perceptual evaluation of speech quality (PESQ) and informal listening tests.
Subspace based Speech Enhancement using Common Vector Approach
2016
In this paper, we propose a new speech enhancement method using the common vector approach. Common vector approach is a subspace method used in recognition applications. In the proposed method, we separate the noisy speech data into magnitude and phase in frequency domain. And also magnitude data is separated into common and difference parts using common vector. It is considered that difference part contains the noise. Therefore, this part is cleaned using Linear Minimum Mean Square Error Estimation. After this process, the magnitude data is reconstructed by combining common part. The frequency domain speech data is rebuilt by sum of the reconstructed magnitude data and keeped phase data and transform to time domain on each frame. The proposed method was evaluated under various noise conditions. The results are compared with several methods in well-known quality measures.
The objective of this paper is threefold: (1) to provide an extensive review of signal subspace speech enhancement, (2) to derive an upper bound for the performance of these techniques, and (3) to present a comprehensive study of the potential of subspace filtering to increase the robustness of automatic speech recognisers against stationary additive noise distortions. Subspace filtering methods are based on the orthogonal decomposition of the noisy speech observation space into a signal subspace and a noise subspace. This decomposition is possible under the assumption of a low-rank model for speech, and on the availability of an estimate of the noise correlation matrix. We present an extensive overview of the available estimators, and derive a theoretical estimator to experimentally assess an upper bound to the performance that can be achieved by any subspace-based method. Automatic speech recognition (ASR) experiments with noisy data demonstrate that subspace-based speech enhancement can significantly increase the robustness of these systems in additive coloured noise environments. Optimal performance is obtained only if no explicit rank reduction of the noisy Hankel matrix is performed. Although this strategy might increase the level of the residual noise, it reduces the risk of removing essential signal information for the recogniser's back end. Finally, it is also shown that subspace filtering compares favourably to the well-known spectral subtraction technique.