Tracing vocal emotion expression through the speech chain: do listeners perceive what speakers feel (original) (raw)

Acoustic Analyses Support Subjective Judgments of Vocal Emotion

Annals of the New York Academy of Sciences, 2006

Subjective human judgments of emotion in speech have been considered to be less reliable than acoustic analyses in scientific studies, but acoustic analyses have had limited ability to detect subtle vocal nuances that give useful social information about human intent and meaning to discourse partners. Two post hoc analyses were undertaken to determine if results from acoustic analyses of vocalizations were related to subjective judgments of vocal affect (affective prosody). Acoustic analyses of fundamental frequency (F o ) and subjective judgments of emotional content of vocal productions from two studies underwent statistical analyses: Study 1-vocal repetition of sentences using 6 basic emotions in 24 detoxified alcoholics and 15 controls; study 2-quality/quantity of "motherese" speech directed to 52 infants in Cambridge, England. Ratings of emotion indicators for both studies were done by female researchers of different ages and cultural/language backgrounds. In both studies, acoustic analyses of F o elements in utterances accounted for approximately 50% of the effect when modeling subjective emotion accuracy and emotion intensity ratings, using linear regression analyses. Acoustic analyses of F o are positively associated with subjective judgments of emotion indicators, and speakers who cannot vary F o are unable to convey emotion accurately to communication partners. Yet acoustic analyses are limited in comparison to the exquisite complexity of the human auditory and cognitive systems. Subjective judgments of emotional meaning in speech can be a reliable variable in scientific inquiry and can be used for more complex, subtle studies of speech communication and intentionality than acoustic analyses.

Sounds of Emotion: Production and Perception of Affect-Related Vocal Acoustics

Annals of The New York Academy of Sciences, 2006

Abstract: In his writing Darwin emphasized direct veridical links between vocal acoustics and vocalizer emotional state. Yet he also recognized that acoustics influence the emotional state of listeners. This duality—that particular vocal expressions are likely linked to particular internal states, yet may specifically function to influence others—lies at the heart of contemporary efforts aimed at understanding affect-related vocal acoustics. That work has focused most on speech acoustics and laughter, where the most common approach has been to argue that these signals reflect the occurrence of discrete emotional states in the vocalizer. An alternative view is that the underlying states can be better characterized using a small number of continuous dimensions such as arousal (or activation) and a valenced dimension such as pleasantness. A brief review of the evidence suggests, however, that neither approach is correct. Data from speech-related research provides little support for a discrete-emotions view, with emotion-related aspects of the acoustics seeming more to reflect to vocalizer arousal. However, links to a corresponding emotional valence dimension have also been difficult to demonstrate, suggesting a need for interpretations outside this traditional dichotomy. We therefore suggest a different perspective in which the primary function of signaling is not to express signaler emotion, but rather to impact listener affect and thereby influence the behavior of these individuals. In this view, it is not expected that nuances of signaler states will be highly correlated with particular features of the sounds produced, but rather that vocalizers will be using acoustics that readily affect listener arousal and emotion. Attributions concerning signaler states thus become a secondary outcome, reflecting inferences that listeners base on their own affective responses to the sounds, their past experience with such signals, and the context in which signaling is occurring. This approach has found recent support in laughter research, with the bigger picture being that the sounds of emotion—be they carried in speech, laughter, or other species-typical signals—are not informative, veridical beacons on vocalizer states so much as tools of social influence used to capitalize on listener sensitivities.

Acoustic profiles in vocal emotion expression

Journal of Personality and Social Psychology, 1996

Professional actors' portrayals of 14 emotions varying in intensity and valence were presented to judges. The results on decoding replicate earlier findings on the ability of judges to infer vocally expressed emotions with much-better-than-chance accuracy, including consistently found differences in the recognizability of different emotions. A total of 224 portrayals were subjected to digital acoustic analysis to obtain profiles of vocal parameters for different emotions. The data suggest that vocal parameters not only index the degree of intensity typical for different emotions but also differentiate valence or quality aspects. The data are also used to test theoretical predictions on vocal patterning based on the component process model of emotion (K. R. Scherer, 1986). Although most hypotheses are supported, some need to be revised on the basis of the empirical evidence. Discriminant analysis and jackknifing show remarkably high hit rates and patterns ofc0nfusion that closely mirror those found for listener-judges. The important role of vocal cues in the expression of emotion, both felt and feigned, and the powerful effects of vocal affect expression on interpersonal interaction and social influence have been recognized ever since antiquity (see Cicero's De Oratore or Quintilian's Institutio Oratoria; cf. Scherer, 1993). Darwin (1872/1965), in his pioneering monograph on the expression of emotion in animals and humans, underlined the primary significance of the voice as a carrier of affective signals. More recently, ethologists and psychologists have identified the various functions of vocal affect communication with respect to major dimensions of organismic states (e.g., activity or arousal, valence) and interorganismic relationships (e.g., dominance, nurturance), particularly for the communication of reaction patterns and behavioral intentions (see Cosmides,

Affect Expression: Global and Local Control of Voice Source Parameters

Speech Prosody 2022, 2022

This paper explores how the acoustic characteristics of the voice signal affect. It considers the proposition that the cueing of affect relies on variations in voice source parameters (including f0) that involve both global, uniform shifts across an utterance, and local, within-utterance changes, at prosodically relevant points. To test this, a perception test was conducted with stimuli where modifications were made to voice source parameters of a synthesised baseline utterance, to target angry and sad renditions. The baseline utterance was generated with the ABAIR Irish TTS system, for one male and one female voice. The voice parameter manipulations drew on earlier production and perception experiments, and involved three stimulus series: those with global, local and a combination of global and local adjustments. 65 listeners judged the stimuli as one of the following: angry, interested, no emotion, relaxed and sad, and indicated how strongly any affect was perceived. Results broadly support the initial proposition, in that the most effective signalling of both angry and sad affect tended to involve those stimuli which combined global and local adjustments. However, results for stimuli targeting angry were often judged as interested, indicating that the negative valence is not consistently cued by the manipulations in these stimuli.

Emotion Appraisal Dimensions can be Inferred From Vocal Expressions

Social Psychological and Personality Science, 2012

Vocal expressions are thought to convey information about speakers' emotional states, but may also reflect the antecedent cognitive appraisal processes that produced the emotions. We investigated the perception of emotion-eliciting situations on the basis of vocal expressions. Professional actors vocally portrayed different emotions by enacting emotion-eliciting situations. Judges then rated these expressions with respect to the emotion-eliciting situation described in terms of appraisal dimensions (i.e., novelty, intrinsic pleasantness, goal conduciveness, urgency, power, self-and other-responsibility, and norm compatibility), achieving good agreement. The perceived appraisal profiles for the different emotions were generally in accord with predictions based on appraisal theory. The appraisal ratings also correlated with a variety of acoustic measures related to pitch, intensity, voice quality, and temporal characteristics. Results suggest that several aspects of emotion-eliciting situations can be inferred reliably and validly from vocal expressions which, thus, may carry information about the cognitive representation of events.

Superior Communication of Positive Emotions Through Nonverbal Vocalisations Compared to Speech Prosody

Journal of Nonverbal Behavior

The human voice communicates emotion through two different types of vocalizations: nonverbal vocalizations (brief non-linguistic sounds like laughs) and speech prosody (tone of voice). Research examining recognizability of emotions from the voice has mostly focused on either nonverbal vocalizations or speech prosody, and included few categories of positive emotions. In two preregistered experiments, we compare human listeners’ (total n = 400) recognition performance for 22 positive emotions from nonverbal vocalizations (n = 880) to that from speech prosody (n = 880). The results show that listeners were more accurate in recognizing most positive emotions from nonverbal vocalizations compared to prosodic expressions. Furthermore, acoustic classification experiments with machine learning models demonstrated that positive emotions are expressed with more distinctive acoustic patterns for nonverbal vocalizations as compared to speech prosody. Overall, the results suggest that vocal expr...

The role of voice quality in communicating emotion, mood and attitude

Speech Communication, 2003

This paper explores the role of voice quality in the communication of emotions, moods and attitudes. ListenersÕ reactions to an utterance synthesised with seven different voice qualities were elicited in terms of pairs of opposing affective attributes. The voice qualities included harsh voice, tense voice, modal voice, breathy voice, whispery voice, creaky voice and lax-creaky voice. These were synthesised using a formant synthesiser, and the voice source parameter settings were guided by prior analytic studies as well as auditory judgements. Results offer support for some past observations on the association of voice quality and affect, and suggest a number of refinements in some cases. Lis-tenersÕ ratings further suggest that these qualities are considerably more effective in signalling milder affective states than the strong emotions. It is clear that there is no one-to-one mapping between voice quality and affect: rather a given quality tends to be associated with a cluster of affective attributes.

Emotional Pre-eminence of Human Vocalizations

Brain Topography, 2008

Human vocalizations (HV), as well as environmental sounds, convey a wide range of information, including emotional expressions. The latter have been relatively rarely investigated, and, in particular, it is unclear if duration-controlled non-linguistic HV sequences can reliably convey both positive and negative emotional information. The aims of the present psychophysical study were: (i) to generate a battery of duration-controlled and acoustically controlled extreme valence stimuli, and (ii) to compare the emotional impact of HV with that of other environmental sounds. A set of 144 HV and other environmental sounds was selected to cover emotionally positive, negative, and neutral values. Sequences of 2 s duration were rated on Likert scales by 16 listeners along three emotional dimensions (arousal, intensity, and valence) and two non-emotional dimensions (confidence in identifying the sound source and perceived loudness). The 2 s stimuli were reliably perceived as emotionally positive, negative or neutral. We observed a linear relationship between intensity and arousal ratings and a ''boomerangshaped'' intensity-valence distribution, as previously reported for longer, duration-variable stimuli. In addition, the emotional intensity ratings for HV were higher than for other environmental sounds, suggesting that HV constitute a characteristic class of emotional auditory stimuli. In addition, emotionally positive HV were more readily identified than other sounds, and emotionally negative stimuli, irrespective of their source, were perceived as louder than their positive and neutral counterparts. In conclusion, HV are a distinct emotional category of environmental sounds and they retain this emotional preeminence even when presented for brief periods.

Analysing Speech to Define Happiness

This project defines emotion in speech under the categories happy, neutral and unhappy using definitions derived from classified feature extraction results. The features extracted were categorised according to their meaning (for example Pitch/Frequency, Noise/Timbre, etc.) and the results were classified using two methods of classification including k-means clustering. The data set used consisted of speech samples from Emo-DB (or the Berlin data set), which were then tested subjectively using MUSHRAM in order to revaluate their given emotion labels. The data was relabelled according to the results found in the subjective testing phase. The project shows that the definitions found from feature extraction classification provide a set of rules that give a 67% chance of identifying the correct emotion given that the audio speech sample submitted falls within the three categories of happy, neutral or unhappy. It also showed that the most accurate set of definitions derive from Noise/Timbre based features with Pitch/Frequency related features a very close second. The project is flawed in that for example, only three emotional states were analysed, using MIRtoolbox created data analysis issues, the data set was very small, and the definitions were formed roughly by finding a midpoint between various averages or cluster centres, but the project does form a useful model for further development.