Robustness and independence of voice timbre features under live performance acoustic degradations (original) (raw)
Related papers
Frontiers Neuroscience, 2020
Many post-lingually deafened cochlear implant (CI) users report that they no longer enjoy listening to music, which could possibly contribute to a perceived reduction in quality of life. One aspect of music perception, vocal timbre perception, may be difficult for CI users because they may not be able to use the same timbral cues available to normal hearing listeners. Vocal tract resonance frequencies have been shown to provide perceptual cues to voice categories such as baritone, tenor, mezzo-soprano, and soprano, while changes in glottal source spectral slope are believed to be related to perception of vocal quality dimensions such as fluty vs. brassy. As a first step toward understanding vocal timbre perception in CI users, we employed an 8-channel noise-band vocoder to test how vocoding can alter the timbral perception of female synthetic sung vowels across pitches. Non-vocoded and vocoded stimuli were synthesized with vibrato using 3 excitation source spectral slopes and 3 vocal tract transfer functions (mezzo-soprano, intermediate, soprano) at the pitches C4, B4, and F5. Six multi-dimensional scaling experiments were conducted: C4 not vocoded, C4 vocoded, B4 not vocoded, B4 vocoded, F5 not vocoded, and F5 vocoded. At the pitch C4, for both non-vocoded and vocoded conditions, dimension 1 grouped stimuli according to voice category and was most strongly predicted by spectral centroid from 0 to 2 kHz. While dimension 2 grouped stimuli according to excitation source spectral slope, it was organized slightly differently and predicted by different acoustic parameters in the non-vocoded and vocoded conditions. For pitches B4 and F5 spectral centroid from 0 to 2 kHz most strongly predicted dimension 1. However, while dimension 1 separated all 3 voice categories in the vocoded condition, dimension 1 only separated the soprano stimuli from the intermediate and mezzo-soprano stimuli in the non-vocoded condition. While it is unclear how these results predict timbre perception in CI listeners, in general, these results suggest that perhaps some aspects of vocal timbre may remain.
2005 IEEE International Conference on Multimedia and Expo
In this work we strive to find an optimal set of acoustic features for the discrimination of speech, monophonic singing, and polyphonic music to robustly segment acoustic media streams for annotation and interaction purposes. Furthermore we introduce ensemble-based classification approaches within this task. From a basis of 276 attributes we select the most efficient set by SVM-SFFS. Additionally relevance of single features by calculation of information gain ratio is presented. As a basis of comparison we reduce dimensionality by PCA. We show extensive analysis of different classifiers within the named task. Among these are Kernel Machines, Decision Trees, and Bayesian Classifiers. Moreover we improve single classifier performance by Bagging and Boosting, and finally combine strengths of classifiers by StackingC. The database is formed by 2,114 samples of speech, and singing of 58 persons. 1,000 Music clips have been taken from the MTV-Europe-Top-20 1980-2000. The outstanding discrimination results of a working realtime capable implementation stress the practicability of the proposed novel ideas.
Sensors
Singing voice is a human quality that requires the precise coordination of numerous kinetic functions and results in a perceptually variable auditory outcome. The use of multi-sensor systems can facilitate the study of correlations between the vocal mechanism kinetic functions and the voice output. This is directly relevant to vocal education, rehabilitation, and prevention of vocal health issues in educators; professionals; and students of singing, music, and acting. In this work, we present the initial design of a modular multi-sensor system for singing voice analysis, and describe its first assessment experiment on the ‘vocal breathiness’ qualitative characteristic. A system case study with two professional singers was conducted, utilizing signals from four sensors. Participants sung a protocol of vocal trials in various degrees of intended vocal breathiness. Their (i) vocal output, (ii) phonatory function, and (iii) respiratory behavior-per-condition were recorded through a cond...
Assessing vowel quality for singing evaluation
2012 National Conference on Communications (NCC), 2012
The proper pronunciation of lyrics is an important component of vocal music. While automatic vowel classification has been widely studied for speech, a separate investigation of the methods is needed for singing due to the differences in acoustic properties between sung and spoken vowels. Acoustic features combining spectrum envelope and pitch are used with classifiers trained on sung vowels for classification of test vowels segmented from the audio of solo singing. Two different classifiers are tested, viz., Gaussian Mixture Models (GMM) and Linear Regression, and observed to perform well on both male and female sung vowels.
Computational-Tools for the Analysis of Acoustical and Physiological Criteria of the Singing Voice
2010
Tools and methods for the calculation of acoustical, musical and physiological criteria using a sound file from the singing voice are presented, as well as tools for the automation of the time-dependant calculation processes used for analysis and statistics. The difference between recordings in an unechoic chamber and a simulated condition with reverberation is calculated using a soundfile recorded in the unechoic chamber and compared to its derivative altered by a hall simulator. The tools presented are Praat, Wavesurfer, SNDAN (used for Sound Analysis), TAP, ProToo (an automation tool with a graphical surface) and matlab Scripts like VoiceSauce (HNR, spectral tilt, formants) and Aparat (Inverse Filter). The acoustical criteria described are long time average spectra, α-factor, and time dependant probabilities of single vowels shown in the parameters loudness (rms), brilliancy (normalised spectral centroid), vibrato (pitch), Formants (LPC), and the spectra. Physiological criteria c...
Perceptual assessment of voice quality: Past, present, and future
Despite many years of research, we still do not know how to measure vocal quality. This paper reviews the history of quality assessment, describes some reasons why current approaches are unlikely to be fruitful, and proposes an alternative approach that addresses the primary difficulties with existing protocols.
Comparing timbre estimation using auditory models with and without hearing loss
2012
We propose a concept for evaluating signal transfor mations for music signals with respect to an individual hearing deficit by using an auditory model. This deficit is simulated in the model by changing specific model parameters. Our idea is extract ing the musical attributes rhythm, pitch, loudness and timbre and comparing the modified model output to the original one. While rhythm, pitch, and loudness estimation are studied in previous works the focus in this paper concentrates on timbre estimation. Results are shown for the original auditory model and three mod els, each simulating a specific hearing loss.