SPEECH NOISE SEPARATION USING NON-NEGATIVE MATRIX FACTORIZATION (original) (raw)

Single-channel speaker-dependent speech enhancement exploiting generic noise model learned by non-negative matrix factorization

2016 International Conference on Electronics, Information, and Communications (ICEIC), 2016

This paper considers the single-channel speech separation problem given a noisy observation recorded by a microphone. More precisely, we focus on the speaker-dependent approach where spectral characteristic of target speech is learned in advance from a clean example. In training process, we propose to learn a generic spectral model for noise source by collecting various types of environmental noise via the established non-negative matrix factorization framework. In speech enhancement process, we propose to combine two existing group sparsity-inducing penalties in the optimization function and derive the corresponding algorithm for parameter estimation based on multiplicative update (MU) rule. Experiment result over mixtures containing different real-world noises confirms the effectiveness of our approach.

Real-time speech separation by semi-supervised nonnegative matrix factorization

Latent Variable Analysis and Signal Separation, 2012

In this paper, we present an on-line semi-supervised algorithm for real-time separation of speech and background noise. The proposed system is based on Nonnegative Matrix Factorization (NMF), where fixed speech bases are learned from training data whereas the noise components are estimated in real-time on the recent past.

Speech enhancement based on nonnegative matrix factorization with mixed group sparsity constraint

Proceedings of the Sixth International Symposium on Information and Communication Technology - SoICT 2015, 2015

This paper addresses a challenging single-channel speech enhancement problem in real-world environment where speech signal is corrupted by high level background noise. While most state-of-the-art algorithms tries to estimate noise spectral power and filter it from the observed one to obtain enhanced speech, the paper discloses another approach inspired from audio source separation technique. In the considered method, generic spectral characteristics of speech and noise are first learned from various training signals by non-negative matrix factorization (NMF). They are then used to guide the similar factorization of the observed power spectrogram into speech part and noise part. Additionally, we propose to combine two existing group sparsity-inducing penalties in the optimization process and adapt the corresponding algorithm for parameter estimation based on multiplicative update (MU) rule. Experiment results over different settings confirm the effectiveness of the proposed approach.

Itakura-Saito Divergence Non Negative Matrix Factorization with Application to Monaural Speech Separation

International Journal of Computer Applications, 2016

Monaural source separation is an interesting area that has received much attention in the signal processing community as it is a pre-processing step in many applications. However, many solutions have been developed to achieve clean separation based on Non-Negative Matrix Factorization (NMF). In this work, we proposed a variant of Itakura-Saito Divergence NMF based on source filter model that captures the temporal continuity of speech signal. The algorithm shows a very good separation results for mixture of two speech sources in terms of artifacts reduction. Besides that, Source to distortion ratio (SDR) and Source to Artifact Ratio (SAR) were found to be higher when compared with NMF algorithms with Kullback-Leibler and Euclidean divergences.

Multiple-order non-negative matrix factorization for speech enhancement

Amongst the speech enhancement techniques, statistical models based on Non-negative Matrix Factorization (NMF) have received great attention. In a single channel configuration, NMF is used to describe the spectral content of both the speech and noise sources. As the number of components can have a crucial influence on separation quality, we here propose to investigate model order selection based on the variational Bayesian approximation to the marginal likelihood of models of different orders. To go further, we propose to use model averaging to combine several single-order NMFs and we show that a straightforward application of model averaging principles is inefficient as it turned out to be equivalent to model selection. We thus introduce a parameter to control the entropy of the model order distribution which makes the averaging effective. We also show that our probabilistic model nicely extends to a multipleorder NMF model where several NMFs are jointly estimated and averaged. Experiments are conducted on real data from the CHiME challenge and give an interesting insight on the entropic parameter and model order priors. Separation results are also promising as model averaging outperforms single-order model selection. Finally, our multiple-order NMF shows an interesting gain in computation time.

A perceptually enhanced blind single-channel audio source separation by non-negative matrix factorization

European Signal Processing Conference, 2010

This paper proposes a 2D Non-negative Matrix Factorization (NMF) based single-channel source separation algorithm that emphasizes perceptually important components of audio. Unlike the existing methods, the proposed scheme performs a psychoacoustic pre-processing on the mixture spectrogram in order to supress audio components that are not critical to human hearing sensation while amplifying the perceptually important ones. This yields the auditory spectrogram referred as sonogram of the observed audio mixture and the individual sources are then extracted by 2D NMF. Test results reported in terms of Signal-to-Distortion-Ratio (SDR), Signalto-Inference-Ratio (SIR) and Signal-to-Artifact-Ratio (SAR) show that the proposed perceptually enhanced separation improves the quality of decomposed audio sources by 1.5-6.5 dB with a reduced computational complexity.

Exploiting Nonnegative Matrix Factorization with Mixed Group Sparsity Constraint to Separate Speech Signal from Single-channel Mixture with Unknown Ambient Noise

EAI Endorsed Transactions on Context-aware Systems and Applications, 2018

This paper focuses on solving a challenging speech enhancement problem: improving the desired speech from a singlechannel audio signal containing high-level unspecified noise (possibly environmental noise, music, other sounds, etc.). Using source separation technique, we investigate a solution combining nonnegative matrix factorization (NMF) with mixed group sparsity constraint that allows exploiting generic noise spectral model to guide the separation process. The experiment performed on a set of benchmarked audio signals with different types of real-world noise shows that the proposed algorithm yields better quantitative results in term of the signal-to-distortion ratio than the previously published algorithms.

Recognize and separate approach for speech denoising using nonnegative matrix factorization

2015 23rd European Signal Processing Conference (EUSIPCO), 2015

This paper proposes a novel approach for denoising single-channel noisy speech signals. A speech dictionary and multiple noise dictionaries are trained using nonnegative matrix factorization (NMF). After observing the mixed signal, first the type of noise in the mixed signal is identified. The magnitude spectrogram of the noisy signal is decomposed using NMF with the concatenated trained dictionaries of noise and speech. Our results indicate that recognizing the noise type from the mixed signal and using the corresponding specific noise dictionary provides better results than using a general noise dictionary in the NMF approach. We also compare our algorithm with other state-of-the-art denoising methods and show that it has better performance than the competitors in most cases.

Non-negative matrix factorization for speech/music separation using source dependent decomposition rank, temporal continuity term and filtering

Biomedical Signal Processing and Control, 2017

Non-negative matrix factorization (NMF) is a recently well-known method for separating speech from music signal as a single channel source separation problem. In this approach, spectrogram of each source signal is factorized as a multiplication of two matrices known as basis and weight matrices. To obtain a good estimation of signal spectrogram, weight and basis matrices are updated based on a cost function, iteratively. In standard NMF, each frame of signal is considered as an independent observation and this assumption is a drawback for NMF. For overcoming this weakness, a regularization term is added to the cost function to consider spectral temporal continuity. Furthermore, in the standard NMF, the same decomposition rank is usually used for different sources. In this paper, in accompany with using a regularization term, we propose to apply a filter to the signals estimated by NMF. The filter is constructed by signals which are estimated using a regularized NMF method. Moreover, we propose to use different decomposition ranks for speech and music signals as different sources. Experimental results on one hour of speech and music signals show that the proposed method increases signal to inference ratio (SIR) values for speech and music signals in comparison to conventional NMF methods.