Improving instrument recognition in polyphonic music through system integration (original) (raw)
Related papers
Musical Instrument Recognition in Polyphonic Audio Using Missing Feature Approach
IEEE Transactions on Audio, Speech, and Language Processing, 2000
This paper proposes a novel approach to musical instrument recognition in polyphonic audio signals by using a source-filter model and an augmented non-negative matrix factorization algorithm for sound separation. The mixture signal is decomposed into a sum of spectral bases modeled as a product of excitations and filters. The excitations are restricted to harmonic spectra and their fundamental frequencies are estimated in advance using a multipitch estimator, whereas the filters are restricted to have smooth frequency responses by modeling them as a sum of elementary functions on the Mel-frequency scale. The pitch and timbre information are used in organizing individual notes into sound sources. In the recognition, Mel-frequency cepstral coefficients are used to represent the coarse shape of the power spectrum of sound sources and Gaussian mixture models are used to model instrument-conditional densities of the extracted features. The method is evaluated with polyphonic signals, randomly generated from 19 instrument classes. The recognition rate for signals having six note polyphony reaches 59%.
A Shift-Invariant Latent Variable Model for Automatic Music Transcription
In this work, a probabilistic model for multiple-instrument automatic music transcription is proposed. The model extends the shift-invariant probabilistic latent component analysis method, which is used for spectrogram factorization. Proposed extensions support the use of multiple spectral templates per pitch and per instrument source, as well as a time-varying pitch contribution for each source. Thus, this method can effectively be used for multiple-instrument automatic transcription. In addition, the shift-invariant aspect of the method can be exploited for detecting tuning changes and frequency modulations, as well as for visualizing pitch content. For note tracking and smoothing, pitch-wise hidden Markov models are used. For training, pitch templates from eight orchestral instruments were extracted, covering their complete note range. The transcription system was tested on multiple-instrument polyphonic recordings from the RWC database, a Disklavier data set, and the MIREX 2007 multi-F0 data set. Results demonstrate that the proposed method outperforms leading approaches from the transcription literature, using several error metrics.
Student's-t mixture model based multi-instrument recognition in polyphonic music
2013 IEEE International Conference on Acoustics, Speech and Signal Processing, 2013
We address the problem of multi-instrument recognition in polyphonic music signals. Individual instruments are modeled within a stochastic framework using Student's-t Mixture Models (tMMs). We impose a mixture of these instrument models on the polyphonic signal model. No a priori knowledge is assumed about the number of instruments in the polyphony. The mixture weights are estimated in a latent variable framework from the polyphonic data using an Expectation Maximization (EM) algorithm, derived for the proposed approach. The weights are shown to indicate instrument activity. The output of the algorithm is an Instrument Activity Graph (IAG), using which, it is possible to find out the instruments that are active at a given time. An average F-ratio of 0.75 is obtained for polyphonies containing 2-5 instruments, on a experimental test set of 8 instruments: clarinet, flute, guitar, harp, mandolin, piano, trombone and violin.
Multiple-instrument polyphonic music transcription using a convolutive probabilistic model
2011
In this paper, a method for automatic transcription of music signals using a convolutive probabilistic model is proposed, by extending the shift-invariant Probabilistic Latent Component Analysis method. Several note templates from multiple orchestral instruments are extracted from monophonic recordings and are used for training the transcription system. By incorporating shift-invariance into the model along with the constant-Q transform as a timefrequency representation, tuning changes and frequency modulations such as vibrato can be better supported. For postprocessing, Hidden Markov Models trained on MIDI data are employed, in order to favour temporal continuity. The system was tested on classical and jazz recordings from the RWC database, on recordings from a Disklavier piano, and a woodwind quintet recording. The proposed method, which can also be used for pitch content visualization, outperforms several state-of-the-art approaches for transcription, using a variety of error metrics.
Instrument recognition in polyphonic music
2005
We propose a method for the recognition of musical instruments in polyphonic music excerpted from commercial recordings. By exploiting some cues on the common structures of musical ensembles, we show that it is possible to recognize up to 4 instruments playing concurrently. The system associates a hierarchical classification tree with a class-pairwise feature selection technique and Gaussian Mixture Models to discriminate possible combinations of instruments. Successful identification is achieved over shorttime windows, enabling the system to be employed for segmentation purposes.
Musical Instrument Recognition In Multi-Instrument Audio Contexts
2018
Automatic musical instrument recognition is an important aspect of machine listening. In this project, we deal with instrument recognition in the multi-instrument audio contexts. We evaluate the performance of a traditional machine learning method in juxtaposition with a deep learning method in a supervised multi-label multi-output machine learning approach. We also tune a set of analysis parameters: {analysis window size, hop size, binarization threshold} to improve the performance. We investigate the possibility of improving the instrument recognition performance by using alternative data representations along with the original data. We consider two such sets of alternative data representations: 1) LRMS (left, right, mid, side) channel audio data derived from the stereo audio, and 2) The harmonic and residual representations derived from the original audio. We propose two different strategies to combine the models built using each of the data representation sets and evaluate their...
Improving Automatic Music Transcription Through Key Detection
In this paper, a method for automatic transcription of polyphonic music is proposed that exploits key information. The proposed system performs key detection using a matching technique with distributions of pitch class pairs, called Zweiklang profiles. The automatic transcription system is based on probabilistic latent component analysis, supporting templates from multiple instruments, as well as tuning deviations and frequency modulations. Key information is incorporated to the transcription system using Dirichlet priors during the parameter update stage. Experiments are performed on a polyphonic, multiple-instrument dataset of Bach chorales, where it is shown that incorporating key information improves multi-pitch detection and instrument assignment performance.
Proc. of ISMIR, 2009
In this paper we present an approach towards the classification of pitched and unpitched instruments in polyphonic audio. In particular, the presented study accounts for three aspects currently lacking in literature: model scalability to polyphonic data, model generalisation in respect to the number of instruments, and incorporation of perceptual information. Therefore, our goal is a unifying recognition framework which enables the extraction of the main instruments' information. The applied methodology consists of training classifiers with audio descriptors, using extensive datasets to model the instruments sufficiently. All data consist of real world music, including categories of 11 pitched and 3 percussive instruments. We designed our descriptors by temporal integration of the raw feature values, which are directly extracted from the polyphonic data. Moreover, to evaluate the applicability of modelling temporal aspects in polyphonic audio, we studied the performance of different encodings of the temporal information. Along with accuracies of 63% and 78% for the pitched and percussive classification task, results show both the importance of temporal encoding as well as strong limitations of modelling it accurately.
EURASIP Journal on Advances in Signal Processing, 2007
A new approach to instrument identification based on individual partials is presented. It makes identification possible even when the concurrently played instrument sounds have a high degree of spectral overlapping. A pairwise comparison scheme which emphasizes the specific differences between each pair of instruments is used for classification. Finally, the proposed method only requires a single note from each instrument to perform the classification. If more than one partial is available the resulting multiple classification decisions can be summarized to further improve instrument identification for the whole signal. Encouraging classification results have been obtained in the identification of four instruments (saxophone, piano, violin and guitar).