Dan Stowell | Queen Mary, University of London (original) (raw)
Papers by Dan Stowell
Abstract: We describe an inference task in which a set of timestamped event observations must be ... more Abstract: We describe an inference task in which a set of timestamped event observations must be clustered into an unknown number of temporal sequences with independent and varying rates of observations. Various existing approaches to multi-object tracking assume a fixed number of sources and/or a fixed observation rate; we develop an approach to inferring structure in timestamped data produced by a mixture of an unknown and varying number of similar Markov renewal processes, plus independent clutter noise.
I. PROBLEM FORMULATION Over the last decade, there has been an increased interest in the speech a... more I. PROBLEM FORMULATION Over the last decade, there has been an increased interest in the speech and audio processing community in code dissemination and public evaluation of proposed methods. Public evaluation can serve as a reference point for the performance of proposed methods and can also be used for studying performance improvements throughout the years.
In musical performances with expressive tempo modulation, the tempo variation can be modelled as ... more In musical performances with expressive tempo modulation, the tempo variation can be modelled as a sequence of tempo arcs. Previous authors have used this idea to estimate series of piecewise arc segments from data. In this paper we describe a probabilistic model for a time-series process of this nature, and use this to perform inference of single-and multi-level arc processes from data. We describe an efficient Viterbi-like process for MAP inference of arcs. Our approach is scoreagnostic, and together with efficient inference allows for online analysis of performances including improvisations, and can predict immediate future tempo trajectories.
Probabilistic approaches to tracking often use single-source Bayesian models; applying these to m... more Probabilistic approaches to tracking often use single-source Bayesian models; applying these to multi-source tasks is problematic. We apply a principled multi-object tracking implementation, the Gaussian mixture probability hypothesis density filter, to track multiple sources having fixed pitch plus vibrato. We demonstrate high-quality filtering in a synthetic experiment, and find improved tracking using a richer feature set which captures underlying dynamics. Our implementation is available as open-source Python code.
Harmonic birdsong is often highly nonstationary, which suggests that standard FFT representations... more Harmonic birdsong is often highly nonstationary, which suggests that standard FFT representations may be of limited suitability. Wavelet and chirplet techniques exist in the literature, but are not often applied to signals such as bird vocalisations, perhaps due to analysis complexity. In this paper we develop a single-scale chirp analysis (computationally accelerated using FFT) which can be treated as an ordinary time-series. We then study a sinusoidal representation simply derived from the peak bins of this time-series. We show that it can lead to improved species classification from birdsong.
ABSTRACT Jean-Claude Risset described an eternal accelerando illusion, related to Shepard tones... more ABSTRACT Jean-Claude Risset described an eternal accelerando illusion, related to Shepard tones, in which a rhythm can be constructed to give the perception of continuous acceleration. The effect can in principle be derived from any rhythmic template, producing ...
This technical report reviews the state of the art in machine recognition of UK birdsong, primari... more This technical report reviews the state of the art in machine recognition of UK birdsong, primarily for an audience of music informatics researchers. It briefly describes the signal properties and variability of birdsong, before surveying automatic birdsong recognition methods in the published literature, as well as available software implementations. Music informatics researchers may recognise the majority of the signal representations and machine learning algorithms applied; however the source material has important differences from musical signals (e.g. its temporal structure) which necessitate differences in approach. As part of our investigation we developed a prototype Vamp plugin for birdsong clustering and segmentation, which we describe.
Applications such as concatenative synthesis (audio mosaicing) and query-by-example require the a... more Applications such as concatenative synthesis (audio mosaicing) and query-by-example require the ability to search a database using a sound which is qualitatively different from the actual desired result—for example when using vocal queries to retrieve nonvocal sound. Standard query techniques such as nearest neighbours do not account for this difference between source and target; they perform retrieval but do not learn to make timbral analogies. This paper addresses this issue by considering timbral query as a multivariate regression problem from one timbre distribution onto another. We develop a novel variant of multivariate tree regression: given only a set of unlabelled and unpaired samples from two distributions on the same space, the regression learns a cross-associative mapping which assumes general similarities in structure of the two distributions, yet can accommodate differences in shape at various scales. We demonstrate the technique with a synthetic example and with a concatenative synthesizer.
ABSTRACT To help maximise the usefulness of MIR technologies in the wider community, we conducted... more ABSTRACT To help maximise the usefulness of MIR technologies in the wider community, we conducted an ethnographic study of music lessons in secondary schools in London, UK. The purpose is to understand better how musical concepts are negotiated with and without technology, so we can understand when and how MIR tools might be useful. We report on some of the themes uncovered, both about the range of technologies deployed in schools and about the ways different musical concepts are discussed. Importantly, this rich ...
International Journal of Human-computer Studies / International Journal of Man-machine Studies, Jan 1, 2009
Live music-making using interactive systems is not completely amenable to traditional HCI evaluat... more Live music-making using interactive systems is not completely amenable to traditional HCI evaluation metrics such as task-completion rates. In this paper we discuss quantitative and qualitative approaches which provide opportunities to evaluate the music-making interaction, accounting for aspects which cannot be directly measured or expressed numerically, yet which may be important for participants. We present case studies in the application of a qualitative method based on Discourse Analysis, and a quantitative method based on the Turing Test. We compare and contrast these methods with each other, and with other evaluation approaches used in the literature, and discuss factors affecting which evaluation methods are appropriate in a given context.
Proceedings of the International Computer Music …, Jan 1, 2007
Proceedings of the …, Jan 1, 2008
… of London, Technical Report, Centre for …, Jan 1, 2008
Signal Processing Letters, IEEE, Jan 1, 2009
Proceedings of the Digital Music Research …, Jan 1, 2007
Abstract: We describe an inference task in which a set of timestamped event observations must be ... more Abstract: We describe an inference task in which a set of timestamped event observations must be clustered into an unknown number of temporal sequences with independent and varying rates of observations. Various existing approaches to multi-object tracking assume a fixed number of sources and/or a fixed observation rate; we develop an approach to inferring structure in timestamped data produced by a mixture of an unknown and varying number of similar Markov renewal processes, plus independent clutter noise.
I. PROBLEM FORMULATION Over the last decade, there has been an increased interest in the speech a... more I. PROBLEM FORMULATION Over the last decade, there has been an increased interest in the speech and audio processing community in code dissemination and public evaluation of proposed methods. Public evaluation can serve as a reference point for the performance of proposed methods and can also be used for studying performance improvements throughout the years.
In musical performances with expressive tempo modulation, the tempo variation can be modelled as ... more In musical performances with expressive tempo modulation, the tempo variation can be modelled as a sequence of tempo arcs. Previous authors have used this idea to estimate series of piecewise arc segments from data. In this paper we describe a probabilistic model for a time-series process of this nature, and use this to perform inference of single-and multi-level arc processes from data. We describe an efficient Viterbi-like process for MAP inference of arcs. Our approach is scoreagnostic, and together with efficient inference allows for online analysis of performances including improvisations, and can predict immediate future tempo trajectories.
Probabilistic approaches to tracking often use single-source Bayesian models; applying these to m... more Probabilistic approaches to tracking often use single-source Bayesian models; applying these to multi-source tasks is problematic. We apply a principled multi-object tracking implementation, the Gaussian mixture probability hypothesis density filter, to track multiple sources having fixed pitch plus vibrato. We demonstrate high-quality filtering in a synthetic experiment, and find improved tracking using a richer feature set which captures underlying dynamics. Our implementation is available as open-source Python code.
Harmonic birdsong is often highly nonstationary, which suggests that standard FFT representations... more Harmonic birdsong is often highly nonstationary, which suggests that standard FFT representations may be of limited suitability. Wavelet and chirplet techniques exist in the literature, but are not often applied to signals such as bird vocalisations, perhaps due to analysis complexity. In this paper we develop a single-scale chirp analysis (computationally accelerated using FFT) which can be treated as an ordinary time-series. We then study a sinusoidal representation simply derived from the peak bins of this time-series. We show that it can lead to improved species classification from birdsong.
ABSTRACT Jean-Claude Risset described an eternal accelerando illusion, related to Shepard tones... more ABSTRACT Jean-Claude Risset described an eternal accelerando illusion, related to Shepard tones, in which a rhythm can be constructed to give the perception of continuous acceleration. The effect can in principle be derived from any rhythmic template, producing ...
This technical report reviews the state of the art in machine recognition of UK birdsong, primari... more This technical report reviews the state of the art in machine recognition of UK birdsong, primarily for an audience of music informatics researchers. It briefly describes the signal properties and variability of birdsong, before surveying automatic birdsong recognition methods in the published literature, as well as available software implementations. Music informatics researchers may recognise the majority of the signal representations and machine learning algorithms applied; however the source material has important differences from musical signals (e.g. its temporal structure) which necessitate differences in approach. As part of our investigation we developed a prototype Vamp plugin for birdsong clustering and segmentation, which we describe.
Applications such as concatenative synthesis (audio mosaicing) and query-by-example require the a... more Applications such as concatenative synthesis (audio mosaicing) and query-by-example require the ability to search a database using a sound which is qualitatively different from the actual desired result—for example when using vocal queries to retrieve nonvocal sound. Standard query techniques such as nearest neighbours do not account for this difference between source and target; they perform retrieval but do not learn to make timbral analogies. This paper addresses this issue by considering timbral query as a multivariate regression problem from one timbre distribution onto another. We develop a novel variant of multivariate tree regression: given only a set of unlabelled and unpaired samples from two distributions on the same space, the regression learns a cross-associative mapping which assumes general similarities in structure of the two distributions, yet can accommodate differences in shape at various scales. We demonstrate the technique with a synthetic example and with a concatenative synthesizer.
ABSTRACT To help maximise the usefulness of MIR technologies in the wider community, we conducted... more ABSTRACT To help maximise the usefulness of MIR technologies in the wider community, we conducted an ethnographic study of music lessons in secondary schools in London, UK. The purpose is to understand better how musical concepts are negotiated with and without technology, so we can understand when and how MIR tools might be useful. We report on some of the themes uncovered, both about the range of technologies deployed in schools and about the ways different musical concepts are discussed. Importantly, this rich ...
International Journal of Human-computer Studies / International Journal of Man-machine Studies, Jan 1, 2009
Live music-making using interactive systems is not completely amenable to traditional HCI evaluat... more Live music-making using interactive systems is not completely amenable to traditional HCI evaluation metrics such as task-completion rates. In this paper we discuss quantitative and qualitative approaches which provide opportunities to evaluate the music-making interaction, accounting for aspects which cannot be directly measured or expressed numerically, yet which may be important for participants. We present case studies in the application of a qualitative method based on Discourse Analysis, and a quantitative method based on the Turing Test. We compare and contrast these methods with each other, and with other evaluation approaches used in the literature, and discuss factors affecting which evaluation methods are appropriate in a given context.
Proceedings of the International Computer Music …, Jan 1, 2007
Proceedings of the …, Jan 1, 2008
… of London, Technical Report, Centre for …, Jan 1, 2008
Signal Processing Letters, IEEE, Jan 1, 2009
Proceedings of the Digital Music Research …, Jan 1, 2007