Voice quality and f0 cues for affect expression: implications for synthesis (original) (raw)

The role of voice quality in communicating emotion, mood and attitude

Speech Communication, 2003

This paper explores the role of voice quality in the communication of emotions, moods and attitudes. ListenersÕ reactions to an utterance synthesised with seven different voice qualities were elicited in terms of pairs of opposing affective attributes. The voice qualities included harsh voice, tense voice, modal voice, breathy voice, whispery voice, creaky voice and lax-creaky voice. These were synthesised using a formant synthesiser, and the voice source parameter settings were guided by prior analytic studies as well as auditory judgements. Results offer support for some past observations on the association of voice quality and affect, and suggest a number of refinements in some cases. Lis-tenersÕ ratings further suggest that these qualities are considerably more effective in signalling milder affective states than the strong emotions. It is clear that there is no one-to-one mapping between voice quality and affect: rather a given quality tends to be associated with a cluster of affective attributes.

Voice Quality and Loudness in Affect Perception

2008

Different voice qualities tend to vary in terms of their intrinsic loudness. Perceptual experiments have shown that voice quality variation can be strongly associated with the affective colouring of an utterance. The question addressed in this paper concerns the role that the intrinsic loudness variation might play in this voice quality-to-affect mapping. To test the hypothesis that the intrinsic loudness

Affect Expression: Global and Local Control of Voice Source Parameters

Speech Prosody 2022, 2022

This paper explores how the acoustic characteristics of the voice signal affect. It considers the proposition that the cueing of affect relies on variations in voice source parameters (including f0) that involve both global, uniform shifts across an utterance, and local, within-utterance changes, at prosodically relevant points. To test this, a perception test was conducted with stimuli where modifications were made to voice source parameters of a synthesised baseline utterance, to target angry and sad renditions. The baseline utterance was generated with the ABAIR Irish TTS system, for one male and one female voice. The voice parameter manipulations drew on earlier production and perception experiments, and involved three stimulus series: those with global, local and a combination of global and local adjustments. 65 listeners judged the stimuli as one of the following: angry, interested, no emotion, relaxed and sad, and indicated how strongly any affect was perceived. Results broadly support the initial proposition, in that the most effective signalling of both angry and sad affect tended to involve those stimuli which combined global and local adjustments. However, results for stimuli targeting angry were often judged as interested, indicating that the negative valence is not consistently cued by the manipulations in these stimuli.

The role of voice quality and prosodic contour in affective speech perception

Speech Communication, 2012

We explore the usage of voice quality and prosodic contour in the identification of emotions and attitudes in French. For this purpose, we develop a corpus of affective speech based on one lexically neutral utterance and apply prosody transplantation method in our perception experiment. We apply logistic regression to analyze our categorical data and we observe differences in the identification of these two affective categories. Listeners primarily use prosodic contour in the identification of studied attitudes. Emotions are identified on the basis of voice quality and prosodic contour. However, their usage is not homogeneous within individual emotions. Depending on the stimuli, listeners may use both voice quality and prosodic contour, or privilege just one of them for the successful identification of emotions. The results of our study are discussed in view of their importance for speech synthesis.

Voice quality in affect cueing: does loudness matter?

Frontiers in Psychology, 2013

In emotional speech research, it has been suggested that loudness, along with other prosodic features, may be an important cue in communicating high activation affects. In earlier studies, we found different voice quality stimuli to be consistently associated with certain affective states. In these stimuli, as in typical human productions, the different voice qualities entailed differences in loudness. To examine the extent to which the loudness differences among these voice qualities might influence the affective coloring they impart, two experiments were conducted with the synthesized stimuli, in which loudness was systematically manipulated. Experiment 1 used stimuli with distinct voice quality features including intrinsic loudness variations and stimuli where voice quality (modal voice) was kept constant, but loudness was modified to match the non-modal qualities. If loudness is the principal determinant in affect cueing for different voice qualities, there should be little or no difference in the responses to the two sets of stimuli. In Experiment 2, the stimuli included distinct voice quality features but all had equal loudness to test the hypothesis that equalizing the perceived loudness of different voice quality stimuli will have relatively little impact on affective ratings. The results suggest that loudness variation on its own is relatively ineffective whereas variation in voice quality is essential to the expression of affect. In Experiment 1, stimuli incorporating distinct voice quality features consistently obtained higher ratings than the modal voice stimuli with varied loudness. In Experiment 2, non-modal voice quality stimuli proved potent in affect cueing even with loudness differences equalized. Although loudness per se does not seem to be the major determinant of perceived affect, it can contribute positively to affect cueing: when combined with a tense or modal voice quality, increased loudness can enhance signaling of high activation states.

THE EFFECTS OF EMOTIONS ON VOICE QUALITY

1999

Two studies are presented, in which emotional vocal recordings were made using a computer emotion induction task and an imagination technique. Concurrent recordings were made of a variety of physiological parameters, including electroglottograph, respiration, electrocardiogram, and surface electromyogram (muscle tension). Acoustic parameters pertaining to voice quality, including F0 floor, F0 range, jitter and spectral energy distribution were analysed and compared

Cross-language differences in how voice quality andf0contours map to affect

Journal of the Acoustical Society of America, 2018

The relationship between prosody and perceived affect involves multiple variables. This paper explores the interplay of three: voice quality, f0 contour, and the hearer's language background. Perception tests were conducted with speakers of Irish English, Russian, Spanish and Japanese using three types of synthetic stimuli: (1) stimuli varied in voice quality, (2) stimuli of uniform (modal) voice quality incorporating affect-related f0 contours, and (3) stimuli combining specific non-modal voice qualities with the affect-related f0 contours of (2). The participants rated the stimuli for the presence/strength of affective colouring on six bipolar scales, e.g., happy-sad. The results suggest that stimuli incorporating non-modal voice qualities, with or without f0 variation, are generally more effective in affect cueing than stimuli varying only in f0. Along with similarities in the affective responses across these languages, many points of divergence were found, both in terms of the range and strength of affective responses overall and in terms of specific stimulus-to-affect associations. The f0 contour may play a more important role, and tense voice a lesser role in affect signalling in Japanese and Spanish than in Irish English and Russian. The greatest cross-language differences emerged for the affects intimate, formal, stressed and relaxed.

The effects of emotion of voice in synthesized and recorded speech

… and intelligent II: the tangled knot …, 2001

This study examines whether emotion conveyed in recorded and synthesized voices affects perceptions of emotional valence of content, perceptions of suitability of content, liking of content, and credibility of content as well as whether recorded or synthesized speech influences perceptions differently. Participants heard two news stories, two movie descriptions and two health stories in a 2 (type of speech: recorded vs. synthesized) by 2 (consistency of voice emotion and content emotion: matched vs. mismatched) balanced, between subjects experiment. A happy voice, whether synthesized or recorded, made content seem happier and more suitable for extroverts, and a sad (synthesized or recorded) voice made content seem less happy and less interesting for extroverts. Participants reported liking content more when voice emotion and content emotion were matched, but rated information as more credible when voice emotion and content emotion were mismatched. Implications for design are discussed.