Voice quality and f0 cues for affect expression: implications for synthesis (original) (raw)

Synthesised stimuli were used to investigate how two notionally separable dimensions of tone-of-voice -voice quality and fundamental frequency -are involved in the expression of affect. Listeners were presented with three series of stimuli: (1) stimuli exemplifying different voice qualities, (2) stimuli all with modal voice quality but with different affect-related f 0 contours, and (3) stimuli incorporating variation in both voice quality and affect-related f 0 contours. A total of 15 stimuli were rated for 12 different affective attributes. Voice quality differentiation appears to account for the highest affect ratings overall, as indicated by the scores obtained for stimuli series and . The relatively weaker affect signalling of stimuli differentiated by f 0 alone corroborates findings in . It also suggests that for the generation of expressive, affectively coloured speech synthesis, it is not sufficient to manipulate only f 0 ; we also need to capture the voice quality dimension of the voice source.