Recognize and separate approach for speech denoising using nonnegative matrix factorization (original) (raw)
Related papers
Speech denoising using nonnegative matrix factorization with priors
2008
Abstract We present a technique for denoising speech using nonnegative matrix factorization (NMF) in combination with statistical speech and noise models. We compare our new technique to standard NMF and to a state-of-the-art Wiener filter implementation and show improvements in speech quality across a range of interfering noise types.
SPEECH NOISE SEPARATION USING NON-NEGATIVE MATRIX FACTORIZATION
In this work we focus on the single channel audio source decomposition, since many of modern technologies like visual assistance systems and many other software solutions depends on the human sound interaction there is a need to efficient techniques to human speech sound enhancement to get as possibly as pure human sound before the speech regression.
Regularized non-negative matrix factorization with temporal dependencies for speech denoising
2008
Abstract We present a tecchnique for denoising speech using temporally regularized nonnegative matrix factorization (NMF). In previous work [1], we used a regularized NMF update to impose structure within each audio frame. In this paper, we add frameto-frame regularization across time and show that this additional regularization can also improve our speech denoising results. We evaluate our algorithm on a range of nonstationary noise types and outperform a state-of-the-art Wiener filter implementation.
Audio Enhancement and Denoising using Online Non-Negative Matrix Factorization and Deep Learning
International Journal for Research in Applied Science & Engineering Technology (IJRASET), 2022
For many years, reducing noise in a noisy speech recording has been a difficult task with numerous applications. This gives scope to use better techniques to enhance the audio and speech and to reduce the noise in the audio. One such technique is Online Non-Negative Matrix Factorization (ONMF).
Speech enhancement based on nonnegative matrix factorization with mixed group sparsity constraint
Proceedings of the Sixth International Symposium on Information and Communication Technology - SoICT 2015, 2015
This paper addresses a challenging single-channel speech enhancement problem in real-world environment where speech signal is corrupted by high level background noise. While most state-of-the-art algorithms tries to estimate noise spectral power and filter it from the observed one to obtain enhanced speech, the paper discloses another approach inspired from audio source separation technique. In the considered method, generic spectral characteristics of speech and noise are first learned from various training signals by non-negative matrix factorization (NMF). They are then used to guide the similar factorization of the observed power spectrogram into speech part and noise part. Additionally, we propose to combine two existing group sparsity-inducing penalties in the optimization process and adapt the corresponding algorithm for parameter estimation based on multiplicative update (MU) rule. Experiment results over different settings confirm the effectiveness of the proposed approach.
Multiple-order non-negative matrix factorization for speech enhancement
Amongst the speech enhancement techniques, statistical models based on Non-negative Matrix Factorization (NMF) have received great attention. In a single channel configuration, NMF is used to describe the spectral content of both the speech and noise sources. As the number of components can have a crucial influence on separation quality, we here propose to investigate model order selection based on the variational Bayesian approximation to the marginal likelihood of models of different orders. To go further, we propose to use model averaging to combine several single-order NMFs and we show that a straightforward application of model averaging principles is inefficient as it turned out to be equivalent to model selection. We thus introduce a parameter to control the entropy of the model order distribution which makes the averaging effective. We also show that our probabilistic model nicely extends to a multipleorder NMF model where several NMFs are jointly estimated and averaged. Experiments are conducted on real data from the CHiME challenge and give an interesting insight on the entropic parameter and model order priors. Separation results are also promising as model averaging outperforms single-order model selection. Finally, our multiple-order NMF shows an interesting gain in computation time.
Digital Signal Processing, 2020
In this paper, we propose a novel single-channel speech enhancement algorithm that applies dualdomain transforms comprising of dual-tree complex wavelet transform (DTCWT) and short-time Fourier transform (STFT) with a sparse non-negative matrix factorization (SNMF). The first domain belongs to the DTCWT, which is utilized on the time domain signals to conquer the weakness of signal distortions brought about by the downsampling of the discrete wavelet packet transform (DWPT) and delivered a set of subband signals. The second domain alludes to the STFT, which is exploited to each subband signal and built a complex spectrogram. At last, we apply the SNMF to the magnitude spectrogram for extracting speech components. In short, the DTCWT decomposes the time-domain noisy signal into a set of subband signals and afterward applied STFT to each subband signal, and we get nonnegative matrices by taking the absolute value of the complex matrix. From this point forward, we apply SNMF to each nonnegative matrix and identify the speech components. Finally, the estimated signal can be achieved through a subband binary ratio mask (SBRM) by applying the inverse STFT (ISTFT) and, subsequently, the inverse DTCWT (IDTCWT). The proposed approach is assessed utilizing the GRID audiovisual and IEEE databases, and diverse kinds of noises such as stationary, non-stationary, and quasi-stationary. The exploratory outcomes demonstrate that the proposed algorithm improved objective speech quality and intelligibility altogether at all considered signal to noise ratios (SNRs), compared to the other seven speech enhancement methods of STFT-SNMF, STFT-SNMFSE, MLD-STFT-SNMF, STFT-GDL, STFT-CJSR, DTCWT-SNMF, and DWPT-STFT-SNMF.
Real-time speech separation by semi-supervised nonnegative matrix factorization
Latent Variable Analysis and Signal Separation, 2012
In this paper, we present an on-line semi-supervised algorithm for real-time separation of speech and background noise. The proposed system is based on Nonnegative Matrix Factorization (NMF), where fixed speech bases are learned from training data whereas the noise components are estimated in real-time on the recent past.
Adaptive Noise Estimation Based on Non-negative Matrix Factorization
IT Convergence and its Applications, 2013
In this paper, an adaptive noise estimation technique is proposed on the basis of non-negative matrix factorization (NMF). As an initial step of the proposed method, the noise basis matrix of NMF is estimated from a collection of noise signals. Then, the proposed method updates the initially estimated noise basis matrix on the fly by using an estimate of the noise spectrum from the noisy signal. It is here demonstrated that the proposed method provides a better noise estimate than a NMF-based method without using any adaptation, especially when there is a mismatch in noise conditions for noise basis training and estimation using NMF.