Changing emotional tone in dialogue and its prosodic correlates (original) (raw)

Exploring the prosody of affective speech

ExLing Conferences, 2022

This paper introduces a research project on voice quality and affect expression. It explores affective prosody by investigating the relationship between voice source parameter changes and perceived affect. Firstly, it aims to examine the relative contribution of voice source shifts occurring globally across an utterance and shifts that are aligned to the prosodic structure of the utterance. Secondly, it aims to formulate a simple model for affect expression that could, in principle, be applied to text-to-speech synthesis systems for Irish (Gaelic) dialects. The analytic methods to be used include voice source and intonation analysis of utterances produced to portray a range of emotions, and perception experiments with stimuli varying in terms of global vs. local, structured source manipulations.

Perception of levels of emotion in prosody

2015

Prosody conveys information about the emotional state of the speaker. In this study we test whether listeners are able to detect different levels in the emotional state of the speaker based on prosodic features such as intonation, speech rate and intensity. We ran a perception experiment in which we ask Swiss German and Chinese listeners to recognize the intended emotions that the professional speaker produced. The results indicate that both Chinese and Swiss German listeners could identify the intended emotions. However, Swiss German listeners could detect different levels of happiness and sadness better than the Chinese listeners. This finding might show that emotional prosody does not function categorically, distinguishing only different emotions, but also indicates different degrees of the expressed emotion.

Communicating Emotion: Linking Affective Prosody and Word Meaning

Journal of Experimental Psychology-human Perception and Performance, 2008

The present study investigated the role of emotional tone of voice in the perception of spoken words. Listeners were presented with words that had either a happy, sad, or neutral meaning. Each word was spoken in a tone of voice (happy, sad, or neutral) that was congruent, incongruent, or neutral with respect to affective meaning, and naming latencies were collected. Across experiments, tone of voice was either blocked or mixed with respect to emotional meaning. The results suggest that emotional tone of voice facilitated linguistic processing of emotional words in an emotion-congruent fashion. These findings suggest that information about emotional tone is used in the processing of linguistic content influencing the recognition and naming of spoken words in an emotion-congruent manner.

Acoustical Correlates of Affective Prosody

Journal of Voice, 2007

The word ''Anna'' was spoken by 12 female and 11 male subjects with six different emotional expressions: ''rage/hot anger,'' ''despair/ lamentation,'' ''contempt/disgust,'' ''joyful surprise,'' ''voluptuous enjoyment/sensual satisfaction,'' and ''affection/tenderness.'' In an acoustical analysis, 94 parameters were extracted from the speech samples and broken down by correlation analysis to 15 parameters entering subsequent statistical tests. The results show that each emotion can be characterized by a specific acoustic profile, differentiating that emotion significantly from all others. If aversive emotions are tested against hedonistic emotions as a group, it turns out that the best indicator of aversiveness is the ratio of peak frequency (frequency with the highest amplitude) to fundamental frequency, followed by the peak frequency, the percentage of time segments with nonharmonic structure (''noise''), frequency range within single time segments, and time of the maximum of the peak frequency within the utterance. Only the last parameter, however, codes aversiveness independent of the loudness of an utterance.

The role of voice quality in communicating emotion, mood and attitude

Speech Communication, 2003

This paper explores the role of voice quality in the communication of emotions, moods and attitudes. ListenersÕ reactions to an utterance synthesised with seven different voice qualities were elicited in terms of pairs of opposing affective attributes. The voice qualities included harsh voice, tense voice, modal voice, breathy voice, whispery voice, creaky voice and lax-creaky voice. These were synthesised using a formant synthesiser, and the voice source parameter settings were guided by prior analytic studies as well as auditory judgements. Results offer support for some past observations on the association of voice quality and affect, and suggest a number of refinements in some cases. Lis-tenersÕ ratings further suggest that these qualities are considerably more effective in signalling milder affective states than the strong emotions. It is clear that there is no one-to-one mapping between voice quality and affect: rather a given quality tends to be associated with a cluster of affective attributes.

The role of voice quality and prosodic contour in affective speech perception

Speech Communication, 2012

We explore the usage of voice quality and prosodic contour in the identification of emotions and attitudes in French. For this purpose, we develop a corpus of affective speech based on one lexically neutral utterance and apply prosody transplantation method in our perception experiment. We apply logistic regression to analyze our categorical data and we observe differences in the identification of these two affective categories. Listeners primarily use prosodic contour in the identification of studied attitudes. Emotions are identified on the basis of voice quality and prosodic contour. However, their usage is not homogeneous within individual emotions. Depending on the stimuli, listeners may use both voice quality and prosodic contour, or privilege just one of them for the successful identification of emotions. The results of our study are discussed in view of their importance for speech synthesis.

Prosodic cues for emotion characterization in real-life spoken dialogs

Proceedings of European Conference on Speech …, 2003

This paper reports on an analysis of prosodic cues for emotion characterization in 100 natural spoken dialogs recorded at a telephone customer service center. The corpus annotated with task-dependent emotion tags which were validated by a perceptual test. Two F0 range parameters, one at the sentence level and the other at the subsegment level, emerge as the most salient cues for emotion classification. These parameters can differentiate between negative emotion (irritation/anger, anxiety/fear) and neutral attitude and confirm trends illustrated by the perceptual experiment.

Basic Analysis on Prosodic Features in Emotional Speech

Speech is a rich source of information which gives not only about what a speaker says, but also about what the speaker's attitude is toward the listener and toward the topic under discussion—as well as the speak-er's own current state of mind. Recently increasing attention has been directed to the study of the emotional content of speech signals, and hence, many systems have been proposed to identify the emotional content of a spoken utterance. The focus of this research work is to enhance man machine interface by focusing on user's speech emotion. This paper gives the results of the basic analysis on prosodic features and also compares the prosodic features of, various types and degrees of emotional expressions in Tamil speech based on the auditory impressions between the two genders of speakers as well as listeners. The speech samples consist of " neutral " speech as well as speech with three types of emotions (" anger " , " joy " , and &q...

Prosodic Expressions of Emotions and Attitudes in Communicative Feedback

This study investigates the communication of affective-epistemic states (AES) by means of prosody in vocal verbal feedback. The study was conceived as a pilot study to test certain methodological queries about investigating prosody as a part of multimodal communication, and as part of feedback in particular. We find that our method, with some slight adjustments, seems adequate to answer several interesting questions about prosodic features of feedback.

The attitudinal effects of prosody, and how they relate to emotion

ISCA Tutorial and Research Workshop (ITRW) on …, 2000

The aim of this paper is to contribute to a theoretical framework for the study of affective intonation. I draw a distinction between 'attitude' and 'emotion', suggesting that only the latter is likely to be reflected directly in the speech signal, while 'attitude' is reflected indirectly, and can only be explained by a process of linguistic analysis. The term 'attitude', as applied to intonation and prosody, is a problematic one. It has been used differently in different fields, such as social psychology and linguistics, and is not made any clearer by the proliferation of 'attitudinal' labels in the intonation literature. I suggest that while there are clearly prosodic signals in speech which contribute to the impression of 'attitude', this perceived meaning should be treated as a pragmatic implicature or a pragmatic inference. This means that it can only be explained by taking into account contextual features, such as speaker-hearer relationship, and the text itself. The same intonational feature can be attitudinally neutral, or signal positive and negative attitudes depending on a complex interaction between prosody, text and context.

Vocal cues to speaker affect: Testing two models

The Journal of the Acoustical Society of America, 1984

We identified certain assumptions implicit in two divergent approaches to studying vocal affect signaling. The "covariance" model assumes that nonverbal cues function independently of verbal content, and that relevant acoustic parameters covary with the strength of the affect conveyed. The "configuration" model assumes that both verbal and nonverbal cues exhibit categorical linguistic structure, and that different affective messages are conveyed by different configurations of category variables. We tested these assumptions in a series of two judgment experiments in which subjects rated recorded utterances, written transcripts, and three different acoustically masked versions of the utterances. Comparison of the different conditions showed that voice quality and F0 level can convey affective information independently of the verbal context. However, judgments of the unaltered recordings also showed that intonational categories {contour types) conveyed affective information only in interaction with grammatical features of the text. It appears necessary to distinguish between linguistic features of intonation and other {paralinguistic) nonverbal cues and to design research methods appropriate to the type of cues under study.

Emotional and linguistic perception of prosody. Reception of prosody

Folia phoniatrica et logopaedica : official organ of the International Association of Logopedics and Phoniatrics (IALP)

The objective of the study was to find out whether there is a connection between the perception of linguistic intonation contours and emotional intonation. Twenty-four subjects were asked to identify and discriminate emotional prosody listening to subtests 8A and 8B of the Tübinger Affect Battery as well as to 36 utterances that differed in linguistic intonation contour and were first presented normally and then low-pass-filtered. The subjects were divided into an older and a younger group in order to detect a possible age effect. The results showed that the ability to recognize and identify emotional prosody did not decline with age. These results are in contrast to the linguistic intonation contours, for which performance typically declined with age. Also, the low-pass-filtered utterances are more difficult to identify if the intonation contour is not salient, as in imperatives. Finally, the results do not show a gender difference. In sum the results indicate that emotional prosod...

Affect Expression: Global and Local Control of Voice Source Parameters

Speech Prosody 2022, 2022

This paper explores how the acoustic characteristics of the voice signal affect. It considers the proposition that the cueing of affect relies on variations in voice source parameters (including f0) that involve both global, uniform shifts across an utterance, and local, within-utterance changes, at prosodically relevant points. To test this, a perception test was conducted with stimuli where modifications were made to voice source parameters of a synthesised baseline utterance, to target angry and sad renditions. The baseline utterance was generated with the ABAIR Irish TTS system, for one male and one female voice. The voice parameter manipulations drew on earlier production and perception experiments, and involved three stimulus series: those with global, local and a combination of global and local adjustments. 65 listeners judged the stimuli as one of the following: angry, interested, no emotion, relaxed and sad, and indicated how strongly any affect was perceived. Results broadly support the initial proposition, in that the most effective signalling of both angry and sad affect tended to involve those stimuli which combined global and local adjustments. However, results for stimuli targeting angry were often judged as interested, indicating that the negative valence is not consistently cued by the manipulations in these stimuli.

Emotion dynamics in movie dialogues

PLOS ONE

Emotion dynamics is a framework for measuring how an individual’s emotions change over time. It is a powerful tool for understanding how we behave and interact with the world. In this paper, we introduce a framework to track emotion dynamics through one’s utterances. Specifically we introduce a number of utterance emotion dynamics (UED) metrics inspired by work in Psychology. We use this approach to trace emotional arcs of movie characters. We analyze thousands of such character arcs to test hypotheses that inform our broader understanding of stories. Notably, we show that there is a tendency for characters to use increasingly more negative words and become increasingly emotionally discordant with each other until about 90% of the narrative length. UED also has applications in behavior studies, social sciences, and public health.

Validation of affective and neutral sentence content for prosodic testing

Behavior Research Methods, 2008

Conducting a study of emotional prosody often requires that one have a valid set of stimuli for assessing perceived emotion in vocal intonation. In this study, we created a list of sentences with both affective and neutral content, and then validated them against rater opinion. Participants read sentences with content that implied happiness, sadness, anger, fear, or neutrality and rated how well they could imagine each sentence being expressed in each emotion. Coefficients of variation and intraclass correlations were calculated to narrow the list to affective sentences that had high agreement and neutral sentences that had low agreement. We found that raters could easily identify most emotional content and did not ascribe any unique emotion to most neutral content. We also found differences between the intensity of male and female ratings. The final list of sentences is available on the Internet (www.med.upenn.edu/bbl/) and can be recorded for use as stimuli for prosodic studies.

Voice-to-Affect Mapping: Inferences on Language Voice Baseline Settings

2017

Modulations of the voice convey affect, and the precise mapping of voice-to-affect may vary for different languages. However, affect-related modulations occur relative to the baseline affect-neutral voice, which tends to differ from language to language. Little is known about the characteristic long-term voice settings for different languages, and how they influence the use of voice quality to signal affect. In this paper, data from a voice-to-affect perception test involving Russian, English, Spanish and Japanese subjects is reexamined to glean insights concerning likely baseline settings in these languages. The test used synthetic stimuli with different voice qualities (modelled on a male voice), with or without extreme f0 contours as might be associated with affect. Cross-language differences in affect ratings for modal and tense voice suggest that the baseline in Spanish and Japanese is inherently tenser than in Russian and English, and that as a corollary, tense voice serves as a more potent cue to high-activation affects in the latter languages. A relatively tenser baseline in Japanese and Spanish is further suggested by the fact that tense voice can be associated with intimate, a low activation state, just as readily as with the high-activation state interested.

The effects of emotion of voice in synthesized and recorded speech

… and intelligent II: the tangled knot …, 2001

This study examines whether emotion conveyed in recorded and synthesized voices affects perceptions of emotional valence of content, perceptions of suitability of content, liking of content, and credibility of content as well as whether recorded or synthesized speech influences perceptions differently. Participants heard two news stories, two movie descriptions and two health stories in a 2 (type of speech: recorded vs. synthesized) by 2 (consistency of voice emotion and content emotion: matched vs. mismatched) balanced, between subjects experiment. A happy voice, whether synthesized or recorded, made content seem happier and more suitable for extroverts, and a sad (synthesized or recorded) voice made content seem less happy and less interesting for extroverts. Participants reported liking content more when voice emotion and content emotion were matched, but rated information as more credible when voice emotion and content emotion were mismatched. Implications for design are discussed.

Prosodic predictors of upcoming positive or negative content in spoken messages

The Journal of the Acoustical Society of America, 2010

This article examines potential prosodic predictors of emotional speech in utterances perceived as conveying that good or bad news is about to be delivered. Speakers were asked to call an experimental confederate to inform her about whether or not she had been given a job she had applied for. A perception study was then performed in which initial fragments of the recorded utterances, not containing any explicit lexical cues to emotional content, were presented to listeners who had to rate whether good or bad news would follow the utterance. The utterances were then examined to discover acoustic and prosodic features that distinguished between good and bad news. It was found that speakers in the production study were not simply reflecting their own positive or negative mood during the experiment, but rather appeared to be influenced by the valence of the positive or negative message they were preparing to deliver. Positive and negative utterances appeared to be judged differently with respect to a number of perceived attributes of the speakers' voices ͑like sounding hesitant or nervous͒. These attributes correlated with a number of automatically obtained acoustic features.

Tracing vocal emotion expression through the speech chain: do listeners perceive what speakers feel

ISCA Workshop on Plasticity in …, 2005

This study examines whether vocal cues can be used to reliably infer speaker happiness. Two-hundred speakers were asked to perform a simple referential communication task and to rate their current emotional state. A range of vocal cues was traced through the speech chain using path analysis. The results indicate that reported happiness of the speakers and perceived happiness of the listeners were not related. The only vocal cue that mediated between reported and perceived happiness was F1, and, for the female speakers, pitch range. The findings suggest that listeners tend to over-interpret vocal cues as most of them are not likely to be related to speakers' internal states.