Pitch Estimation of the Predominant Vocal Melody from Heterophonic Music Audio Recordings (original) (raw)

Musical information extraction from the singing voice

National Conference on Signal and Image Processing Applications

Music information retrieval is currently an active research area that addresses the extraction of musically important information from audio signals, and the applications of such information. The extracted information can be used for search and retrieval of music in recommendation systems, or to aid musicological studies or even in music learning. Sophisticated signal processing techniques are applied to convert low-level acoustic signal properties to musical attributes which are further embedded in a rulebased or statistical classification framework to link with high-level descriptions such as melody, genre, mood and artist type. Vocal music comprises a large and interesting category of music where the lead instrument is the singing voice. The singing voice is more versatile than many musical instruments and therefore poses interesting challenges to information retrieval systems. In this paper, we provide a brief overview of research in vocal music processing followed by a description of related work at IIT Bombay leading to the development of an interface for melody detection of singing voice in polyphony.

Automatic Transcription of Polyphonic Vocal Music

This paper presents a method for automatic music transcription applied to audio recordings of a cappella performances with multiple singers. We propose a system for multi-pitch detection and voice assignment that integrates an acoustic and a music language model. The acoustic model performs spectrogram decomposition, extending probabilistic latent component analysis (PLCA) using a six-dimensional dictionary with pre-extracted log-spectral templates. The music language model performs voice separation and assignment using hidden Markov models that apply musicological assumptions. By integrating the two models, the system is able to detect multiple concurrent pitches in polyphonic vocal music and assign each detected pitch to a specific voice type such as soprano, alto, tenor or bass (SATB). We compare our system against multiple baselines, achieving state-of-the-art results for both multi-pitch detection and voice assignment on a dataset of Bach chorales and another of barbershop quartets. We also present an additional evaluation of our system using varied pitch tolerance levels to investigate its performance at 20-cent pitch resolution.

Vocal melody extraction in the presence of pitched accompaniment in polyphonic music

Audio, Speech, and Language Processing, …, 2010

Melody extraction algorithms for single-channel polyphonic music typically rely on the salience of the lead melodic instrument, considered here to be the singing voice. However the simultaneous presence of one or more pitched instruments in the polyphony can cause such a predominant-F0 tracker to switch between tracking the pitch of the voice and that of an instrument of comparable strength, resulting in reduced voice-pitch detection accuracy. We propose a system that, in addition to biasing the salience measure in favor of singing voice characteristics, acknowledges that the voice may not dominate the polyphony at all instants and therefore tracks an additional pitch to better deal with the potential presence of locally dominant pitched accompaniment. A feature based on the temporal instability of voice harmonics is used to finally identify the voice pitch. The proposed system is evaluated on test data that is representative of polyphonic music with strong pitched accompaniment. Results show that the proposed system is indeed able to recover melodic information lost to its single-pitch tracking counterpart, and also outperforms another state-of-the-art melody extraction system designed for polyphonic music.

Computational Methods for Melody and Voice Processing in Music Recordings (Dagstuhl Seminar 19052)

2019

In our daily lives, we are constantly surrounded by music, and we are deeply influenced by music. Making music together can create strong ties between people, while fostering communication and creativity. This is demonstrated, for example, by the large community of singers active in choirs or by the fact that music constitutes an important part of our cultural heritage. The availability of music in digital formats and its distribution over the world wide web has changed the way we consume, create, enjoy, explore, and interact with music. To cope with the increasing amount of digital music, one requires computational methods and tools that allow users to find, organize, analyze, and interact with music--topics that are central to the research field known as \emph{Music Information Retrieval} (MIR). The Dagstuhl Seminar 19052 was devoted to a branch of MIR that is of particular importance: processing melodic voices (with a focus on singing voices) using computational methods. It is of...

Detecting key features in popular music: case study - singing voice detection

2009

Detecting distinct features in modern pop music is an important problem that can have significant applications in areas such as multimedia entertainment. They can be used, for example, to give a visually coherent representation of the sound. We propose to integrate a singing voice detector with a multimedia, multi-touch game where the user has to perform simple tasks at certain key points in the music. While the ultimate goal is to automatically create visual content in response to features extracted from the music, here we give special focus to the detection of voice segments in music songs. The solution presented extracts the Mel-Frequency Cepstral Coefficients of the sound and uses a Hidden Markov Model to infer if the sound has voice. The classification rate obtained is high when compared to other singing voice detectors that use Mel-Frequency Cepstral Coefficients.

Multi-Pitch Detection and Voice Assignment for A Cappella Recordings of Multiple Singers

2017

This paper presents a multi-pitch detection and voice assignment method applied to audio recordings containing a cappella performances with multiple singers. A novel approach combining an acoustic model for multi-pitch detection and a music language model for voice separation and assignment is proposed. The acoustic model is a spectrogram factorization process based on Probabilistic Latent Component Analysis (PLCA), driven by a 6-dimensional dictionary with pre-learned spectral templates. The voice separation component is based on hidden Markov models that use musicological assumptions. By integrating the models, the system can detect multiple concurrent pitches in vocal music and assign each detected pitch to a specific voice corresponding to a voice type such as soprano, alto, tenor or bass (SATB). This work focuses on four-part compositions, and evaluations on recordings of Bach Chorales and Barbershop quartets show that our integrated approach achieves an F-measure of over 70% f...

RETREIVING PITCH OF THE SINGING VOICE IN POLYPHONIC AUDIO

2003

This paper explores the extraction of melodic pitch contour from the polyphonic soundtrack of a song. The motivation for this work lies in developing automatic tools for the melodybased indexing of the database in a music retrieval system. The melody is assumed to be carried by the singer's voice accompanied mainly by percussive instruments. This scenario is typical of a large class of Indian movie songs. The challenges raised by this application are presented. A pitch detection method based on a perceptual model is shown to be a promising approach to the tracking of voice pitch in the presence of strong percussive background.

Singing voice detection in polyphonic music using predominant pitch

Interspeech 2009, 2009

This paper demonstrates the superiority of energy-based features derived from the knowledge of predominant-pitch, for singing voice detection in polyphonic music over commonly used spectral features. However, such energy-based features tend to misclassify loud, pitched instruments. To provide robustness to such accompaniment we exploit the relative instability of the pitch contour of the singing voice by attenuating harmonic spectral content belonging to stable-pitch instruments, using sinusoidal modeling. The obtained feature shows high classification accuracy when applied to north Indian classical music data and is also found suitable for automatic detection of vocal-instrumental boundaries required for smoothing the frame-level classifier decisions.