Polyphonic music transcription using note event modeling (original) (raw)

2005, IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 2005.

This paper proposes a method for the automatic transcription of real-world music signals, including a variety of musical genres. The method transcribes notes played with pitched musical instruments. Percussive sounds, such as drums, may be present but they are not transcribed. Musical notations (i.e., MIDI files) are produced from acoustic stereo input files using probabilistic note event modeling. Note events are described with a hidden Markov model (HMM). The model uses three acoustic features extracted with a multiple fundamental frequency (F0) estimator to calculate the likelihoods of different notes and performs temporal segmentation of notes. The transitions between notes are controlled with a musicological model involving musical key estimation and bigram models. The final transcription is obtained by searching for several paths through the note models. Evaluation was carried out with a realistic music database. Using strict evaluation criteria, 39% of all the notes were found (recall) and 41% of the transcribed notes were correct (precision). Taken the complexity of the considered transcription task, the results are encouraging.