Analytical and perceptual study on the role of acoustic features in realizing emotional speech (original) (raw)
Related papers
Basic Analysis on Prosodic Features in Emotional Speech
Speech is a rich source of information which gives not only about what a speaker says, but also about what the speaker's attitude is toward the listener and toward the topic under discussion—as well as the speak-er's own current state of mind. Recently increasing attention has been directed to the study of the emotional content of speech signals, and hence, many systems have been proposed to identify the emotional content of a spoken utterance. The focus of this research work is to enhance man machine interface by focusing on user's speech emotion. This paper gives the results of the basic analysis on prosodic features and also compares the prosodic features of, various types and degrees of emotional expressions in Tamil speech based on the auditory impressions between the two genders of speakers as well as listeners. The speech samples consist of " neutral " speech as well as speech with three types of emotions (" anger " , " joy " , and &q...
Influence of verbal content on acoustics of speech emotions
Proceedings of the 18th International Congress of Phonetic Sciences, 2015
This paper deals with the issue of the influence of verbal content on listeners who have to identify or evaluate speech emotions, and whether or not the emotional aspect of verbal content should be eliminated. We compare the acoustic parameters of sentences expressing joy, anger, sadness and neutrality of two groups: (1) where the verbal content aids the listener in identifying emotions; and (2), where the verbal content does not aid the listener in identifying emotions. The results reveal few significant differences in the acoustic parameters of emotions in the two groups of sentences, and indicate that the elimination of emotional verbal content in speech presented for emotion identification or evaluation is, in most cases, not necessary.
Perception of levels of emotion in prosody
2015
Prosody conveys information about the emotional state of the speaker. In this study we test whether listeners are able to detect different levels in the emotional state of the speaker based on prosodic features such as intonation, speech rate and intensity. We ran a perception experiment in which we ask Swiss German and Chinese listeners to recognize the intended emotions that the professional speaker produced. The results indicate that both Chinese and Swiss German listeners could identify the intended emotions. However, Swiss German listeners could detect different levels of happiness and sadness better than the Chinese listeners. This finding might show that emotional prosody does not function categorically, distinguishing only different emotions, but also indicates different degrees of the expressed emotion.
The role of voice quality and prosodic contour in affective speech perception
Speech Communication, 2012
We explore the usage of voice quality and prosodic contour in the identification of emotions and attitudes in French. For this purpose, we develop a corpus of affective speech based on one lexically neutral utterance and apply prosody transplantation method in our perception experiment. We apply logistic regression to analyze our categorical data and we observe differences in the identification of these two affective categories. Listeners primarily use prosodic contour in the identification of studied attitudes. Emotions are identified on the basis of voice quality and prosodic contour. However, their usage is not homogeneous within individual emotions. Depending on the stimuli, listeners may use both voice quality and prosodic contour, or privilege just one of them for the successful identification of emotions. The results of our study are discussed in view of their importance for speech synthesis.
Synthesis of Emotional Speech by Prosody Modification of Vowel Segments of Neutral Speech
SSRN Electronic Journal, 2019
Speech is viewed as a combination of voiced and unvoiced regions. Voiced speech is produced due to vibration of the vocal cords. The vibrating pattern of vocal cords is different in different emotions. During production of some consonant sound units, vocal cords do not vibrate. Therefore, consonants are less effective for emotion generation in speech signal. In this paper, we have considered only vowel regions for emotion synthesis using three prosody parameters duration, intensity and pitch patterns. Vowel like regions (VLR) is identified using vowel onset and offset points. Onset and offset points are starting and ending points of the vowel like regions. It is observed that during emotional synthesis from neutral speech mainly vowel regions of speech utterance are modified significantly. Our experimental result shows that the emotion synthesis using only prosody modification of VLR is significantly better than emotion synthesis of prosody modification at syllable level and it is also very effective in time consideration. The average mean opinion score is calculated using only vowel level prosody modification. The average mean opinion scores for angry, happy and fear emotional speeches are 3.85, 3.60 and 4.03, respectively. These mean opinion scores are better than syllable level prosody modification which are 3.56, 3.17 and 3.92 for angry, happy and fear emotions, respectively.
Acoustical Correlates of Affective Prosody
Journal of Voice, 2007
The word ''Anna'' was spoken by 12 female and 11 male subjects with six different emotional expressions: ''rage/hot anger,'' ''despair/ lamentation,'' ''contempt/disgust,'' ''joyful surprise,'' ''voluptuous enjoyment/sensual satisfaction,'' and ''affection/tenderness.'' In an acoustical analysis, 94 parameters were extracted from the speech samples and broken down by correlation analysis to 15 parameters entering subsequent statistical tests. The results show that each emotion can be characterized by a specific acoustic profile, differentiating that emotion significantly from all others. If aversive emotions are tested against hedonistic emotions as a group, it turns out that the best indicator of aversiveness is the ratio of peak frequency (frequency with the highest amplitude) to fundamental frequency, followed by the peak frequency, the percentage of time segments with nonharmonic structure (''noise''), frequency range within single time segments, and time of the maximum of the peak frequency within the utterance. Only the last parameter, however, codes aversiveness independent of the loudness of an utterance.
Techniques for the phonetic description of emotional speech
… Tutorial and Research Workshop (ITRW) on Speech …, 2000
It is inconceivable that there could be information present in the speech signal that could be detected by the human auditory system but which is not accessible to acoustic analysis and phonetic categorisation. We know that humans can reliably recognise a range of emotions produced by speakers of their own language on the basis of the acoustic signal alone, yet it appears that our ability to identify the relevant acoustic correlates is at present rather limited. This paper proposes that we have to build a bridge between the human perceptual experience and the measurable properties of the acoustic signal by developing an analytic framework based partly on auditory analysis. A possible framework is outlined which is based on the work of the Reading/Leeds Emotional Speech Database. The project was funded by ESRC Grant no. R000235285.