Audio Enhancement and Denoising using Online Non-Negative Matrix Factorization and Deep Learning (original) (raw)

On audio enhancement via online non-negative matrix factorization

ArXiv, 2021

We propose a method for noise reduction, the task of producing a clean audio signal from a recording corrupted by additive noise. Many common approaches to this problem are based upon applying non-negative matrix factorization to spectrogram measurements. These methods use a noiseless recording, which is believed to be similar in structure to the signal of interest, and a pure-noise recording to learn dictionaries for the true signal and the noise. One may then construct an approximation of the true signal by projecting the corrupted recording on to the clean dictionary. In this work, we build upon these methods by proposing the use of online non-negative matrix factorization for this problem. This method is more memory efficient than traditional nonnegative matrix factorization and also has potential applications to real-time denoising.

Speech denoising using nonnegative matrix factorization with priors

2008

Abstract We present a technique for denoising speech using nonnegative matrix factorization (NMF) in combination with statistical speech and noise models. We compare our new technique to standard NMF and to a state-of-the-art Wiener filter implementation and show improvements in speech quality across a range of interfering noise types.

Recognize and separate approach for speech denoising using nonnegative matrix factorization

2015 23rd European Signal Processing Conference (EUSIPCO), 2015

This paper proposes a novel approach for denoising single-channel noisy speech signals. A speech dictionary and multiple noise dictionaries are trained using nonnegative matrix factorization (NMF). After observing the mixed signal, first the type of noise in the mixed signal is identified. The magnitude spectrogram of the noisy signal is decomposed using NMF with the concatenated trained dictionaries of noise and speech. Our results indicate that recognizing the noise type from the mixed signal and using the corresponding specific noise dictionary provides better results than using a general noise dictionary in the NMF approach. We also compare our algorithm with other state-of-the-art denoising methods and show that it has better performance than the competitors in most cases.

Multiple-order non-negative matrix factorization for speech enhancement

Amongst the speech enhancement techniques, statistical models based on Non-negative Matrix Factorization (NMF) have received great attention. In a single channel configuration, NMF is used to describe the spectral content of both the speech and noise sources. As the number of components can have a crucial influence on separation quality, we here propose to investigate model order selection based on the variational Bayesian approximation to the marginal likelihood of models of different orders. To go further, we propose to use model averaging to combine several single-order NMFs and we show that a straightforward application of model averaging principles is inefficient as it turned out to be equivalent to model selection. We thus introduce a parameter to control the entropy of the model order distribution which makes the averaging effective. We also show that our probabilistic model nicely extends to a multipleorder NMF model where several NMFs are jointly estimated and averaged. Experiments are conducted on real data from the CHiME challenge and give an interesting insight on the entropic parameter and model order priors. Separation results are also promising as model averaging outperforms single-order model selection. Finally, our multiple-order NMF shows an interesting gain in computation time.

SPEECH NOISE SEPARATION USING NON-NEGATIVE MATRIX FACTORIZATION

In this work we focus on the single channel audio source decomposition, since many of modern technologies like visual assistance systems and many other software solutions depends on the human sound interaction there is a need to efficient techniques to human speech sound enhancement to get as possibly as pure human sound before the speech regression.

Regularized non-negative matrix factorization with temporal dependencies for speech denoising

2008

Abstract We present a tecchnique for denoising speech using temporally regularized nonnegative matrix factorization (NMF). In previous work [1], we used a regularized NMF update to impose structure within each audio frame. In this paper, we add frameto-frame regularization across time and show that this additional regularization can also improve our speech denoising results. We evaluate our algorithm on a range of nonstationary noise types and outperform a state-of-the-art Wiener filter implementation.

Supervised single channel dual domains speech enhancement using sparse non-negative matrix factorization

Digital Signal Processing, 2020

In this paper, we propose a novel single-channel speech enhancement algorithm that applies dualdomain transforms comprising of dual-tree complex wavelet transform (DTCWT) and short-time Fourier transform (STFT) with a sparse non-negative matrix factorization (SNMF). The first domain belongs to the DTCWT, which is utilized on the time domain signals to conquer the weakness of signal distortions brought about by the downsampling of the discrete wavelet packet transform (DWPT) and delivered a set of subband signals. The second domain alludes to the STFT, which is exploited to each subband signal and built a complex spectrogram. At last, we apply the SNMF to the magnitude spectrogram for extracting speech components. In short, the DTCWT decomposes the time-domain noisy signal into a set of subband signals and afterward applied STFT to each subband signal, and we get nonnegative matrices by taking the absolute value of the complex matrix. From this point forward, we apply SNMF to each nonnegative matrix and identify the speech components. Finally, the estimated signal can be achieved through a subband binary ratio mask (SBRM) by applying the inverse STFT (ISTFT) and, subsequently, the inverse DTCWT (IDTCWT). The proposed approach is assessed utilizing the GRID audiovisual and IEEE databases, and diverse kinds of noises such as stationary, non-stationary, and quasi-stationary. The exploratory outcomes demonstrate that the proposed algorithm improved objective speech quality and intelligibility altogether at all considered signal to noise ratios (SNRs), compared to the other seven speech enhancement methods of STFT-SNMF, STFT-SNMFSE, MLD-STFT-SNMF, STFT-GDL, STFT-CJSR, DTCWT-SNMF, and DWPT-STFT-SNMF.

Speech enhancement based on nonnegative matrix factorization with mixed group sparsity constraint

Proceedings of the Sixth International Symposium on Information and Communication Technology - SoICT 2015, 2015

This paper addresses a challenging single-channel speech enhancement problem in real-world environment where speech signal is corrupted by high level background noise. While most state-of-the-art algorithms tries to estimate noise spectral power and filter it from the observed one to obtain enhanced speech, the paper discloses another approach inspired from audio source separation technique. In the considered method, generic spectral characteristics of speech and noise are first learned from various training signals by non-negative matrix factorization (NMF). They are then used to guide the similar factorization of the observed power spectrogram into speech part and noise part. Additionally, we propose to combine two existing group sparsity-inducing penalties in the optimization process and adapt the corresponding algorithm for parameter estimation based on multiplicative update (MU) rule. Experiment results over different settings confirm the effectiveness of the proposed approach.

Learning speech features in the presence of noise: Sparse convolutive robust non-negative matrix factorization

2009 16th International Conference on Digital Signal Processing, 2009

We introduce a non-negative matrix factorization technique which learns speech features with temporal extent in the presence of non-stationary noise. Our proposed technique, namely Sparse convolutive robust non-negative matrix factorization, is robust in the presence of noise due to our explicit treatment of noise as an interfering source in the factorization. We derive multiplicative update rules using the alpha divergence objective. We show that our proposed method yields superior performance to sparse convolutive non-negative matrix factorization in a feature learning task on noisy data and comparable results to dedicated speech enhancement techniques.

Local Sparsity Based Online Dictionary Learning for Environment-Adaptive Speech Enhancement with Nonnegative Matrix Factorization

Interspeech 2016, 2016

In this paper, a nonnegative matrix factorization (NMF)-based speech enhancement method robust to real and diverse noise is proposed by online NMF dictionary learning without relying on prior knowledge of noise. Conventional NMF-based methods have used a fixed noise dictionary, which often results in performance degradation when the NMF noise dictionary cannot cover noise types that occur in real-life recording. Thus, the noise dictionary needs to be learned from noises according to the variation of recording environments. To this end, the proposed method first estimates noise spectra and then performs online noise dictionary learning by a discriminative NMF learning framework. In particular, the noise spectra are estimated from minimum mean squared error filtering, which is based on the local sparsity defined by a posteriori signal-to-noise ratio (SNR) estimated from the NMF separation of the previous analysis frame. The effectiveness of the proposed speech enhancement method is demonstrated by adding six different realistic noises to clean speech signals with various SNRs. Consequently, it is shown that the proposed method outperforms comparative methods in terms of signal-to-distortion ratio (SDR) and perceptual evaluation of speech quality (PESQ) for all kinds of simulated noise and SNR conditions.