Multichannel high-resolution NMF for modeling convolutive mixtures of non-stationary signals in the time-frequency domain (original) (raw)
Related papers
IEEE/ACM Trans. Audio Speech Lang. Process., 2014
Several probabilistic models involving latent components have been proposed for modeling time-frequency (TF) representations of audio signals such as spectrograms, notably in the nonnegative matrix factorization (NMF) literature. Among them, the recent high-resolution NMF (HR-NMF) model is able to take both phases and local correlations in each frequency band into account, and its potential has been illustrated in applications such as source separation and audio inpainting. In this paper, HR-NMF is extended to multichannel signals and to convolutive mixtures. The new model can represent a variety of stationary and non-stationary signals, including autoregressive moving average (ARMA) processes and mixtures of damped sinusoids. A fast variational expectation-maximization (EM) algorithm is proposed to estimate the enhanced model. This algorithm is applied to piano signals, and proves capable of accurately modeling reverberation, restoring missing observations, and separating pure tones with close frequencies.
Gaussian modeling of mixtures of non-stationary signals in the Time-Frequency domain (HR-NMF)
2011 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), 2011
Nonnegative Matrix Factorization (NMF) is a powerful tool for decomposing mixtures of non-stationary signals in the Time-Frequency (TF) domain. However, unlike the High Resolution (HR) methods dedicated to mixtures of exponentials, its spectral resolution is limited by that of the underlying TF representation. In this paper, we propose a unified probabilistic model called HR-NMF, that permits to overcome this limit by taking both phases and local correlations in each frequency band into account. This model is estimated with a recursive implementation of the EM algorithm, that is successfully applied to source separation and audio inpainting.
2013 IEEE International Conference on Acoustics, Speech and Signal Processing, 2013
We recently introduced the high-resolution nonnegative matrix factorization (HR-NMF) model for analyzing mixtures of nonstationary signals in the time-frequency domain, and highlighted its capability to both reach high spectral resolution and reconstruct high quality audio signals. In order to estimate the model parameters and the latent components, we proposed to resort to an expectation-maximization (EM) algorithm based on a Kalman filter/smoother. The approach proved to be appropriate for modeling audio signals in applications such as source separation and audio inpainting. However, its computational cost is high, dominated by the Kalman filter/smoother, and may be prohibitive when dealing with high-dimensional signals. In this paper, we consider two different alternatives, using the variational Bayesian EM algorithm and two mean-field approximations. We show that, while significantly reducing the complexity of the estimation, these novel approaches do not alter its quality.
NMF With Time–Frequency Activations to Model Nonstationary Audio Events
IEEE Transactions on Audio, Speech, and Language Processing, 2000
Real world sounds often exhibit time-varying spectral shapes, as observed in the spectrogram of a harpsichord tone or that of a transition between two pronounced vowels. Whereas the standard Non-negative Matrix Factorization (NMF) assumes fixed spectral atoms, an extension is proposed where the temporal activations (coefficients of the decomposition on the spectral atom basis) become frequency dependent and follow a timevarying ARMA modeling. This extension can thus be interpreted with the help of a source/filter paradigm and is referred to as source/filter factorization. This factorization leads to an efficient single-atom decomposition for a single audio event with strong spectral variation (but with constant pitch). The new algorithm is tested on real audio data and shows promising results.
Multiplicative updates for modeling mixtures of non-stationary signals in the time-frequency domain
We recently introduced the high-resolution nonnegative matrix factorization (HR-NMF) model for representing mixtures of non-stationary signals in the time-frequency domain, and we highlighted its capability to both reach a high spectral resolution and reconstruct high quality audio signals. An expectation-maximization (EM) algorithm was also proposed for estimating its parameters. In this paper, we replace the maximization step by multiplicative update rules (MUR), in order to improve the convergence rate. We also introduce general MUR that are not limited to nonnegative parameters, and we propose a new insight into the EM algorithm, which shows that MUR and EM actually belong to the same family. We thus introduce a continuum of algorithms between them. Experiments confirm that the proposed approach permits to overcome the convergence rate of the EM algorithm.
Bayesian extensions to non-negative matrix factorisation for audio signal modelling
2008 IEEE International Conference on Acoustics, Speech and Signal Processing, 2008
We describe the underlying probabilistic generative signal model of non-negative matrix factorisation (NMF) and propose a realistic conjugate priors on the matrices to be estimated. A conjugate Gamma chain prior enables modelling the spectral smoothness of natural sounds in general, and other prior knowledge about the spectra of the sounds can be used without resorting to too restrictive techniques where some of the parameters are fixed. The resulting algorithm, while retaining the attractive features of standard NMF such as fast convergence and easy implementation, outperforms existing NMF strategies in a single channel audio source separation and detection task.
Biomedical Signal Processing and Control, 2017
Non-negative matrix factorization (NMF) is a recently well-known method for separating speech from music signal as a single channel source separation problem. In this approach, spectrogram of each source signal is factorized as a multiplication of two matrices known as basis and weight matrices. To obtain a good estimation of signal spectrogram, weight and basis matrices are updated based on a cost function, iteratively. In standard NMF, each frame of signal is considered as an independent observation and this assumption is a drawback for NMF. For overcoming this weakness, a regularization term is added to the cost function to consider spectral temporal continuity. Furthermore, in the standard NMF, the same decomposition rank is usually used for different sources. In this paper, in accompany with using a regularization term, we propose to apply a filter to the signals estimated by NMF. The filter is constructed by signals which are estimated using a regularized NMF method. Moreover, we propose to use different decomposition ranks for speech and music signals as different sources. Experimental results on one hour of speech and music signals show that the proposed method increases signal to inference ratio (SIR) values for speech and music signals in comparison to conventional NMF methods.
Multichannel Source Separation Using Time-Deconvolutive CNMF
Journal of Communication and Information Systems
This paper addresses the separation of audio sources from convolutive mixtures captured by a microphone array. We approach the problem using complex-valued non-negative matrix factorization (CNMF), and extend previous works by tailoring advanced (single-channel) NMF models, such as the deconvolutive NMF, to the multichannel factorization setup. Further, a sparsitypromoting scheme is proposed so that the underlying estimated parameters better fit the time-frequency properties inherent in some audio sources. The proposed parameter estimation framework is compatible with previous related works, and can be thought of as a step toward a more general method. We evaluate the resulting separation accuracy using a simulated acoustic scenario, and the tests confirm that the proposed algorithm provides superior separation quality when compared to a stateof-the-art benchmark. Finally, an analysis of the effects of the introduced regularization term shows that the solution is in fact steered toward a sparser representation.
Probabilistic Time-Frequency Source-Filter Decomposition of Non-Stationary Signals
Probabilistic modelling of non-stationary signals in the timefrequency (TF) domain has been an active research topic recently. Various models have been proposed, notably in the nonnegative matrix factorization (NMF) literature. In this paper, we propose a new TF probabilistic model that can represent a variety of stationary and non-stationary signals, such as autoregressive moving average (ARMA) processes, uncorrelated noise, damped sinusoids, and transient signals. This model also generalizes and improves both the Itakura-Saito (IS)-NMF and high resolution (HR)-NMF models.