Mark D Plumbley | University of Surrey (original) (raw)

Papers by Mark D Plumbley

PLoS Biology, 2014

Scientists spend an increasing amount of time building and using software. However, most scientis... more Scientists spend an increasing amount of time building and using software. However, most scientists are never taught how to do this efficiently. As a result, many are unaware of tools and practices that would allow them to write more reliable and maintainable code with less effort. We describe a set of best practices for scientific software development that have solid foundations in research and experience, and that improve scientists' productivity and the reliability of their software.

. Interface of the chord progressions visualising tool.

2011 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), 2011

This paper studies the disjointness of the time-frequency representations of simultaneously playi... more This paper studies the disjointness of the time-frequency representations of simultaneously playing musical instruments. As a measure of disjointness, we use the approximate W-disjoint orthogonality as proposed by Yilmaz and Rickard [1], which (loosely speaking) measures the degree of overlap of different sources in the time-frequency domain. The motivation for this study is to find a maximally disjoint representation in order to facilitate the separation and recognition of musical instruments in mixture signals. The transforms investigated in this paper include the short-time Fourier transform (STFT), constant-Q transform, modified discrete cosine transform (MDCT), and pitch-synchronous lapped orthogonal transforms. Simulation results are reported for a database of polyphonic music where the multitrack data (instrument signals before mixing) were available. Absolute performance varies depending on the instrument source in question, but on the average MDCT with 93 ms frame size performed best.

2013 IEEE International Conference on Acoustics, Speech and Signal Processing, 2013

ABSTRACT A method based on local spectral features and missing feature techniques is proposed for... more ABSTRACT A method based on local spectral features and missing feature techniques is proposed for the recognition of harmonic sounds in mixture signals. A mask estimation algorithm is proposed for identifying spectral regions that contain reliable information for each sound source and then bounded marginalization is employed to treat the feature vector elements that are determined as unreliable. The proposed method is tested on musical instrument sounds due to the extensive availability of data but it can be applied on other sounds (i.e. animal sounds, environmental sounds), whenever these are harmonic. In simulations the proposed method clearly outperformed a baseline method for mixture signals.

2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2012

ABSTRACT This paper describes a novel music similarity calculation method that is based on the in... more ABSTRACT This paper describes a novel music similarity calculation method that is based on the instrumentation of music pieces. The approach taken here is based on the idea that sparse representations of musical audio signals are a rich source of information regarding the elements that constitute the observed spectra. We propose a method to extract feature vectors based on sparse representations and use these to calculate a similarity measure between songs. To train a dictionary for sparse representations from a large amount of training data, a novel dictionary-initialization method based on agglomerative clustering is proposed. An objective evaluation shows that the new features improve the performance of similarity calculation compared to the standard mel-frequency cepstral coefficients features.

2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2014

A method is proposed for instrument recognition in polyphonic music which combines two independen... more A method is proposed for instrument recognition in polyphonic music which combines two independent detector systems. A polyphonic musical instrument recognition system using a missing feature approach and an automatic music transcription system based on shift invariant probabilistic latent component analysis that includes instrument assignment. We propose a method to integrate the two systems by fusing the instrument contributions estimated by the first system onto the transcription system in the form of Dirichlet priors. Both systems, as well as the integrated system are evaluated using a dataset of continuous polyphonic music recordings. Detailed results that highlight a clear improvement in the performance of the integrated system are reported for different training conditions.

2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2014

2013 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 2013

This paper describes a newly-launched public evaluation challenge on acoustic scene classificatio... more This paper describes a newly-launched public evaluation challenge on acoustic scene classification and detection of sound events within a scene. Systems dealing with such tasks are far from exhibiting human-like performance and robustness. Undermining factors are numerous: the extreme variability of sources of interest possibly interfering, the presence of complex background noise as well as room effects like reverberation. The proposed challenge is an attempt to help the research community move forward in defining and studying the aforementioned tasks. Apart from the challenge description, this paper provides an overview of systems submitted to the challenge as well as a detailed evaluation of the results achieved by those systems.

Proceedings of the 1st International Workshop on Digital Libraries for Musicology - DLfM '14, 2014

Digital music libraries and collections are growing quickly and are increasingly made available f... more Digital music libraries and collections are growing quickly and are increasingly made available for research. We argue that the use of large data collections will enable a better understanding of music performance and music in general, which will benefit areas such as music search and recommendation, music archiving and indexing, music production and education. However, to achieve these goals it is necessary to develop new musicological research methods, to create and adapt the necessary technological infrastructure, and to find ways of working with legal limitations. Most of the necessary basic technologies exist, but they need to be brought together and applied to musicology. We aim to address these challenges in the Digital Music Lab project, and we feel that with suitable methods and technology Big Music Data can provide new opportunities to musicology.

2010 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2010

We propose an ℓ1 criterion for dictionary learning for sparse signal representation. Instead of d... more We propose an ℓ1 criterion for dictionary learning for sparse signal representation. Instead of directly searching for the dictionary vectors, our dictionary learning approach identifies vectors that are orthogonal to the subspaces in which the training data concentrate. We study conditions on the coefficients of training data that guarantee that ideal normal vectors deduced from the dictionary are local optima of the criterion. We illustrate the behavior of the criterion on a 2D example, showing that the local minima correspond to ideal normal vectors when the number of training data is sufficient. We conclude by describing an algorithm that can be used to optimize the criterion in higher dimension.

Signal Processing 86 417 431, Mar 1, 2006

We consider two approaches for sparse decomposition of polyphonic music: a timedomain approach ba... more We consider two approaches for sparse decomposition of polyphonic music: a timedomain approach based on shift-invariant waveforms, and a frequency-domain approach based on phase-invariant power spectra. When trained on an example of a MIDI-controlled acoustic piano recording, both methods produce dictionary vectors or sets of vectors which represent underlying notes, and produce component activations related to the original MIDI score. The time-domain method is more computationally expensive, but produces sample-accurate spike-like activations and can be used for a direct time-domain reconstruction. The spectral domain method discards phase information, but is faster than the time-domain method and retains more higher-frequency harmonics. These results suggest that these two methods would provide a powerful yet complementary approach to automatic music transcription or object-based coding of musical audio.

2009 17th European Signal Processing Conference, Aug 1, 2009

In this paper we consider the problem of speech denoising based on a greedy adaptive dictionary (... more In this paper we consider the problem of speech denoising based on a greedy adaptive dictionary (GAD) algorithm. The transform is orthogonal by construction, and is found to give a sparse representation of the data being analysed, and to be robust to additive Gaussian noise. The performance of the algorithm is compared to that of the principal component analysis (PCA) method, for a speech denoising application. It is found that the GAD algorithm offers a sparser solution than PCA, while having a similar performance in the presence of noise.

We present details of our submissions to the Audio Tempo Extraction and Audio Beat Tracking conte... more We present details of our submissions to the Audio Tempo Extraction and Audio Beat Tracking contests within MIREX 2006. The approach we adopt makes use of our existing beat tracking technique with a modified tempo extraction stage, and with the provision of three different onset detection functions to act as input. For each onset detection function we extract potential beat locations and then employ a confidence measure to find the most appropriate input representation for a given audio signal. The beats which yield the highest confidence are extracted as the output of the system.

Ieee Acm Transactions on Audio Speech and Language Processing, Nov 1, 2014

Several probabilistic models involving latent components have been proposed for modeling time-fre... more Several probabilistic models involving latent components have been proposed for modeling time-frequency (TF) representations of audio signals such as spectrograms, notably in the nonnegative matrix factorization (NMF) literature. Among them, the recent high-resolution NMF (HR-NMF) model is able to take both phases and local correlations in each frequency band into account, and its potential has been illustrated in applications such as source separation and audio inpainting. In this paper, HR-NMF is extended to multichannel signals and to convolutive mixtures. The new model can represent a variety of stationary and non-stationary signals, including autoregressive moving average (ARMA) processes and mixtures of damped sinusoids. A fast variational expectation-maximization (EM) algorithm is proposed to estimate the enhanced model. This algorithm is applied to piano signals, and proves capable of accurately modeling reverberation, restoring missing observations, and separating pure tones with close frequencies.

ABSTRACT The application of Genetic Algorithms (GAs) to the automated design of artificial neural... more ABSTRACT The application of Genetic Algorithms (GAs) to the automated design of artificial neural networks has received much attention in recent years. A number of common network models have been studied. This paper presents empirical results on the application of GAs to the Probabilistic RAM (pRAM) network model. Pattern recognition tasks are presented to the pRAM networks. These results are compared with those using Simulated Annealing (SA) as well as the reinforcement learning algorithm originally proposed for pRAM networks. We found that both SA and GA do not perform as well as the reinforcement learning algorithm. Nevertheless, GA is able to achieve an average recognition rate of 83.78% at the expense of long running time. 1 Introduction There is increasing interest in the application of machine learning algorithms to assist with design and training of neural networks. One common technique is the use of Genetic Algorithms (GAs) [7], in which a population of individuals in a parallel searc...

Page 1. Sparse Coding of Music Signals Samer A. Abdallah*and Mark D. Plumbley1, Department of Ele... more Page 1. Sparse Coding of Music Signals Samer A. Abdallah*and Mark D. Plumbley1, Department of Electronic Engineering King's College London March 13, 2001 Abstract We discuss the use of unsupervised learning techniques ...

ABSTRACT Perception and cognition can be considered to be processes aimed at discovering the inde... more ABSTRACT Perception and cognition can be considered to be processes aimed at discovering the independent causes behind sensory input. Thus, the goal of perception might be to acheive a factorial coding. Unsupervised learning with neural networks offers a way to implement this using techniques such as sparse coding. This would result in a representation in terms of an optimal set of features, rather than the heuristically guided selection often used now. The Wigner Distribution is a good choice of input to these algorithms for a number of reasons. The principle of factorial coding, applied consistently, could result in natural representations for musical contructs such as melodic phrases or rhythmic motives; something which has obvious applications in music processing. 1 Introduction The problem of music cognition is currently being attacked from a number of directions. One approach grows out of work being done on auditory scene analysis [6, 23], starting with an audio signal and modelling, a...

HAL - hal.archives-ouvertes.fr, CCSd - Centre pour la Communication Scientifique Direct. Home; Su... more HAL - hal.archives-ouvertes.fr, CCSd - Centre pour la Communication Scientifique Direct. Home; Submit: Login; Register. Browse: By subject; News (the last 30); By publishing, writing year; By publication type; By collection; HAL Open Archive portals; ...

This article deals with learning dictionaries for sparse approximation whose atoms are both adapt... more This article deals with learning dictionaries for sparse approximation whose atoms are both adapted to a training set of signals and mutually incoherent. To meet this objective, we employ a dictionary learning scheme consisting of sparse approximation followed by dictionary update and we add to the latter a decorrelation step in order to reach a target mutual coherence level. This step is accomplished by an iterative projection method followed by a rotation of the dictionary. Experiments on musical audio data illustrate that the proposed algorithm can learn highly incoherent dictionaries while providing a sparse approximation with good signal to noise ratio. We show that using dictionaries with low mutual coherence leads to a faster rate of decay of the residual when using MP for sparse approximation.

PLoS Biology, 2014

. Interface of the chord progressions visualising tool.

2011 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), 2011

2013 IEEE International Conference on Acoustics, Speech and Signal Processing, 2013

2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2012

2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2014

2013 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 2013

Proceedings of the 1st International Workshop on Digital Libraries for Musicology - DLfM '14, 2014

2010 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2010

Signal Processing 86 417 431, Mar 1, 2006

2009 17th European Signal Processing Conference, Aug 1, 2009

Ieee Acm Transactions on Audio Speech and Language Processing, Nov 1, 2014