A Shift-Invariant Latent Variable Model for Automatic Music Transcription (original) (raw)
Related papers
Multiple-instrument polyphonic music transcription using a convolutive probabilistic model
2011
In this paper, a method for automatic transcription of music signals using a convolutive probabilistic model is proposed, by extending the shift-invariant Probabilistic Latent Component Analysis method. Several note templates from multiple orchestral instruments are extracted from monophonic recordings and are used for training the transcription system. By incorporating shift-invariance into the model along with the constant-Q transform as a timefrequency representation, tuning changes and frequency modulations such as vibrato can be better supported. For postprocessing, Hidden Markov Models trained on MIDI data are employed, in order to favour temporal continuity. The system was tested on classical and jazz recordings from the RWC database, on recordings from a Disklavier piano, and a woodwind quintet recording. The proposed method, which can also be used for pitch content visualization, outperforms several state-of-the-art approaches for transcription, using a variety of error metrics.
A method for automatic transcription of polyphonic music is proposed in this work that models the temporal evolution of musical tones. The model extends the shift-invariant probabilistic latent component analysis method by supporting the use of spectral templates that correspond to sound states such as attack, sustain, and decay. The order of these templates is controlled using hidden Markov model-based temporal constraints. In addition, the model can exploit multiple templates per pitch and instrument source. The shift-invariant aspect of the model makes it suitable for music signals that exhibit frequency modulations or tuning changes. Pitch-wise hidden Markov models are also utilized in a postprocessing step for note tracking. For training, sound state templates were extracted for various orchestral instruments using isolated note samples. The proposed transcription system was tested on multiple-instrument recordings from various datasets. Experimental results show that the proposed model is superior to a non-temporally constrained model and also outperforms various state-of-the-art transcription systems for the same experiment.
An efficient shift-invariant model for polyphonic music transcription
In this paper, we propose an efficient model for automatic transcription of polyphonic music. The model extends the shift-invariant probabilistic latent component analysis method and uses pre-extracted and pre-shifted note templates from multiple instruments. Thus, the proposed system can efficiently transcribe polyphonic music, while taking into account tuning deviations and frequency modulations. Additional system improvements utilising massive parallel computations with GPUs result in a system performing much faster than real-time. Experimental results using several datasets show that the proposed system can successfully transcribe polyphonic music, outperforming several state-of-the-art approaches, and is over 140 times faster compared to a standard shiftinvariant transcription model.
An Efficient Temporally-Constrained Probabilistic Model for Multiple-Instrument Music Transcription
2015
In this paper, an efficient, general-purpose model for multiple instrument polyphonic music transcription is proposed. The model is based on probabilistic latent component analysis and supports the use of sound state spectral templates, which represent the temporal evolution of each note (e.g. attack, sustain, decay). As input, a variable-Q transform (VQT) time-frequency representation is used. Computational efficiency is achieved by supporting the use of preextracted and pre-shifted sound state templates. Two variants are presented: without temporal constraints and with hiddenMarkov model-based constraints controlling the appearance of sound states. Experiments are performed on benchmark transcription datasets: MAPS, TRIOS,MIREX multiF0, and Bach10; results on multi-pitch detection and instrument assignment show that the proposed models outperform the state-of-the-art for multiple-instrument transcription and is more than 20 times faster compared to a previous sound state-based mod...
Improving Automatic Music Transcription Through Key Detection
In this paper, a method for automatic transcription of polyphonic music is proposed that exploits key information. The proposed system performs key detection using a matching technique with distributions of pitch class pairs, called Zweiklang profiles. The automatic transcription system is based on probabilistic latent component analysis, supporting templates from multiple instruments, as well as tuning deviations and frequency modulations. Key information is incorporated to the transcription system using Dirichlet priors during the parameter update stage. Experiments are performed on a polyphonic, multiple-instrument dataset of Bach chorales, where it is shown that incorporating key information improves multi-pitch detection and instrument assignment performance.
Automatic Transcription of Recorded Music
Acta Acustica united with Acustica, 2012
The automatic transcription of music recordings with the objective to derive as core-liker epresentation from a givenaudio representation is afundamental and challenging task. In particular for polyphonic music recordings with overlapping sound sources, current transcription systems still have problems to accurately extract the parameters of individual notes specified by pitch, onset, and duration. In this article, we present amusic transcription system that is carefully designed to cope with various facets of music. One main idea of our approach is to consistently employam id-levelr epresentation that is based on am usically meaningful pitch scale. To achieve the necessary spectral and temporal resolution, we use amulti-resolution Fourier transform enhanced by an instantaneous frequencye stimation. Subsequently,h aving extracted pitch and note onset information from this representation, we employHidden Markov Models (HMM)for determining the note events in acontext-sensitive fashion. As another contribution, we evaluate our transcription system on an extensive dataset containing audio recordings of various genre. Here, opposed to manyp revious approaches, we do not only rely on synthetic audio material, bute valuate our system on real audio recordings using MIDI-audio synchronization techniques to automatically generate reference annotations.
Polyphonic music transcription using note event modeling
IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 2005., 2005
This paper proposes a method for the automatic transcription of real-world music signals, including a variety of musical genres. The method transcribes notes played with pitched musical instruments. Percussive sounds, such as drums, may be present but they are not transcribed. Musical notations (i.e., MIDI files) are produced from acoustic stereo input files using probabilistic note event modeling. Note events are described with a hidden Markov model (HMM). The model uses three acoustic features extracted with a multiple fundamental frequency (F0) estimator to calculate the likelihoods of different notes and performs temporal segmentation of notes. The transitions between notes are controlled with a musicological model involving musical key estimation and bigram models. The final transcription is obtained by searching for several paths through the note models. Evaluation was carried out with a realistic music database. Using strict evaluation criteria, 39% of all the notes were found (recall) and 41% of the transcribed notes were correct (precision). Taken the complexity of the considered transcription task, the results are encouraging.
Explicit duration hidden Markov models for multiple-instrument polyphonic music transcription
In this paper, a method for multiple-instrument automatic music transcription is proposed that models the temporal evolution and duration of tones. The proposed model supports the use of spectral templates per pitch and instrument which correspond to sound states such as attack, sustain, and decay. Pitch-wise explicit duration hidden Markov models (EDHMMs) are integrated into a convolutive probabilistic framework for modelling the temporal evolution and duration of the sound states. A two-stage transcription procedure integrating note tracking information is performed in order to provide more robust pitch estimates. The proposed system is evaluated on multi-pitch detection and instrument assignment using various publicly available datasets. Results show that the proposed system outperforms a hidden Markov model-based transcription system using the same framework, as well as several state-of-theart automatic music transcription systems.
Generative Spectrogram Factorization Models for Polyphonic Piano Transcription
IEEE Transactions on Audio, Speech, and Language Processing, 2000
We introduce a framework for probabilistic generative models of time-frequency coefficients of audio signals, using a matrix factorization parametrization to jointly model spectral characteristics such as harmonicity and temporal activations and excitations. The models represent the observed data as the superposition of statistically independent sources, and we consider variance-based models used in source separation and intensity-based models for non-negative matrix factorization. We derive a generalized expectation-maximization algorithm for inferring the parameters of the model and then adapt this algorithm for the task of polyphonic transcription of music using labeled training data. The performance of the system is compared to that of existing discriminative and model-based approaches on a dataset of solo piano music.
Automatic transcription of piano music based on HMM tracking of jointly-estimated pitches
2008
This work deals with the automatic transcription of piano recordings into a MIDI symbolic file. The system consists of subsequent stages of onset detection and multipitch estimation and tracking. The latter is based on a Hidden Markov Model framework, embedding a spectral maximum likelihood method for joint pitch estimation. The complexity issue of joint estimation techniques is solved by selecting subsets of simultaneously played notes within a pre-estimated set of candidates. Tests on a large database and comparisons to state-of-the-art methods show promising results.