Acoustic correlates of emotion dimensions in view of speech synthesis (original) (raw)
Related papers
Verification of acoustical correlates of emotional speech using formant-synthesis
… of the ISCA Workshop on Speech and …, 2000
This paper explores the perceptual relevance of acoustical correlates of emotional speech by means of speech synthesis. Besides, the research aims at the development of »emotion− rules« which enable an optimized speech synthesis system to generate emotional speech. Two investigations using this synthesizer are described: 1) the systematic variation of selec− ted acoustical features to gain a preliminary impression regar− ding the importance of certain acoustical features for emotion− al expression, and 2) the specific manipulation of a stimulus spoken under emotionally neutral condition to investigate fur− ther the effect of certain features and the overall ability of the synthesizer to generate recognizable emotional expression. It is shown that this approach is indeed capable of generating emotional speech that is recognized almost as well as utteran− ces realized by actors.
Synthesis of Speech with Emotions
Proc. International Conference on Communication, Computers and Devices
This paper describes the methodology proposed by us for synthesizing speech with emotion. Our work starts with the pitch synchronous analysis of single phoneme utterances with natural emotion to obtain the linear prediction (LP) parameters. For synthesizing speech with emotion, we modify the pitch contour of a normal utterance of a single phoneme. We subsequently filter this signal using the LP parameters. The proposed technique can be used to improve the naturalness of voice in a text-to-speech system.
Analytical and perceptual study on the role of acoustic features in realizing emotional speech
6th International Conference on Spoken Language Processing (ICSLP 2000)
Investigation was conducted on how prosodic features of emotional speech changed depending on emotion levels. The analysis results on fundamental frequency (F 0) contours and speech rates implied that humans have several ways to express emotions and use them rather randomly. Investigation was also conducted on what acoustic features were important to express emotions. Perceptual experiments using synthetic speech with copied acoustic features of target speech indicated importance of the segmental features other than the prosodic features. Especially, a high importance was observed in the case of happiness.
Speech Communication, 2010
We have applied two state-of-the-art speech synthesis techniques (unit selection and HMM-based synthesis) to the synthesis of emotional speech. A series of carefully designed perceptual tests to evaluate speech quality, emotion identification rates and emotional strength were used for the six emotions which we recordedhappiness, sadness, anger, surprise, fear, disgust. For the HMM-based method, we evaluated spectral and source components separately and identified which components contribute to which emotion.
Emotional speech synthesis: Applications, history and possible future
Proc. ESSV, 2009
Emotional speech synthesis is an important part of the puzzle on the long way to human-like artificial human-machine interaction. During the way, lots of stations like emotional audio messages or believable characters in gaming will be reached. This paper discusses technical aspects of emotional speech synthesis, shows practical applications based on a higher level framework and highlights new developments concerning the realization of affective speech with non-uniform unit selection based synthesis and voice transformation techniques.
Emotional Speech Datasets for English Speech Synthesis Purpose : A Review
In this paper, we review the datasets of emotional speech publicly available and their usability for state of the art speech synthesis. This is conditioned by several characteristics of these datasets: the quality of the recordings, the quantity of the data and the emotional content captured contained in the data. We then present a dataset that was recorded based on the observation of the needs in this area. It contains data for male and female actors in English and a male actor in French. The database covers 5 emotion classes so it could be suitable to build synthesis and voice transformation systems with the potential to control the emotional dimension.
Speech synthesis and emotions: a compromise between flexibility and believability
2008
The synthesis of emotional speech is still an open question. The principal issue is how to introduce expressivity without compromising the naturalness of the synthetic speech provided by the state-of-the-art technology. In this paper two concatenative synthesis systems are described and some approaches to address this topic are proposed. For example, considering the intrinsic expressivity of certain speech acts, by exploiting the correlation between affective states and communicative functions, has proven an effective solution. This implies a different approach in the design of the speech databases as well as in the labelling and selection of the "expressive" units. In fact, beyond phonetic and prosodic criteria, linguistic and pragmatic aspects should also be considered. The management of units of different type (neutral vs expressive) is also an important issue.
Emotion Identification for Evaluation of Synthesized Emotional Speech
2018
In this paper, we propose to evaluate the quality of emotional speech synthesis by means of an automatic emotion identification system. We test this approach using five different parametric speech synthesis systems, ranging from plain non-emotional synthesis to full re-synthesis of pre-recorded speech. We compare the results achieved with the automatic system to those of human perception tests. While preliminary, our results indicate that automatic emotion identification can be used to assess the quality of emotional speech synthesis, potentially replacing time consuming and expensive human perception tests
Interdependencies among Voice Source Parameters in Emotional Speech
IEEE Transactions on Affective Computing, 2011
Emotions have strong effects on the voice production mechanisms and consequently on voice characteristics. The magnitude of these effects, measured using voice source parameters, and the interdependencies among parameters have not been examined. To better understand these relationships, voice characteristics were analyzed in 10 actors' productions of a sustained/a/ vowel in five emotions. Twelve acoustic parameters were studied and grouped according to their physiological backgrounds, three related to subglottal pressure, five related to the transglottal airflow waveform derived from inverse filtering the audio signal, and four related to vocal fold vibration. Each emotion appeared to possess a specific combination of acoustic parameters reflecting a specific mixture of physiologic voice control parameters. Features related to subglottal pressure showed strong within-group and betweengroup correlations, demonstrating the importance of accounting for vocal loudness in voice analyses. Multiple discriminant analysis revealed that a parameter selection that was based, in a principled fashion, on production processes could yield rather satisfactory discrimination outcomes (87.1 percent based on 12 parameters and 78 percent based on three parameters). The results of this study suggest that systems to automatically detect emotions use a hypothesis-driven approach to selecting parameters that directly reflect the physiological parameters underlying voice and speech production.
Emotion extractor: A methodology to implement prosody features in speech synthesis
Electronic Computer Technology …, 2010
This paper presents the methodology to extract emotion from the text at real time and add the expression to the documents contents during speech synthesis. To understand the existence of emotions self assessment test was carried out on set of documents and preliminary rules were formulated for three basic emotions: Pleasure, Arousal and Dominance. These rules are used in an automated procedure that assigns emotional state values to document contents. These values are then used by speech synthesizer to add emotions to speech. The system is language independent and content free.