Expression of basic emotions in Estonian parametric text-to-speech synthesis (original) (raw)
Related papers
In the 21st century Estonian speech synthesis has been developed using the more widespread methods and freeware development systems (MBROLA, Festival, eSpeak, HTS). The applications have hitherto been developed mainly in view of the needs of the visually impaired (audio system for reading electronic texts, voicing of subtitles, creation of audiobooks). The major challenges currently facing the Estonian specialists are naturalness of the output speech and expressive speech synthesis. The article is concerned with the issues of statistical modelling of the prosody of synthesized speech and the relations of prosody with other language levels as well as with extralinguistic features. Analysis of the emotion-bound acoustic parameters (pauses, speech rate, formants, intensity and pitch) enable one to model emotions for speech synthesis. In addition, speech synthesis interfaces are discussed. By means of such interfaces users could control the process of speech synthesis, monitor text-to s...
Speech Communication, 2010
We have applied two state-of-the-art speech synthesis techniques (unit selection and HMM-based synthesis) to the synthesis of emotional speech. A series of carefully designed perceptual tests to evaluate speech quality, emotion identification rates and emotional strength were used for the six emotions which we recordedhappiness, sadness, anger, surprise, fear, disgust. For the HMM-based method, we evaluated spectral and source components separately and identified which components contribute to which emotion.
Synthesis of Speech with Emotions
Proc. International Conference on Communication, Computers and Devices
This paper describes the methodology proposed by us for synthesizing speech with emotion. Our work starts with the pitch synchronous analysis of single phoneme utterances with natural emotion to obtain the linear prediction (LP) parameters. For synthesizing speech with emotion, we modify the pitch contour of a normal utterance of a single phoneme. We subsequently filter this signal using the LP parameters. The proposed technique can be used to improve the naturalness of voice in a text-to-speech system.
Development of an emotional speech synthesiser in Spanish
1999
Currently, an essential point in speech synthesis is the addressing of the variability of human speech. One of the main sources of this diversity is the emotional state of the speaker. Most of the recent work in this area has been focused on the prosodic aspects of speech and on rule-based formantsynthesis experiments. Even when adopting an improved voice source, we cannot achieve a smiling happy voice or the menacing quality of cold anger. For this reason, we have performed two experiments aimed at developing a concatenative emotional synthesiser, a synthesiser that can copy the quality of an emotional voice without an explicit mathematical model.
Speech synthesis and emotions: a compromise between flexibility and believability
2008
The synthesis of emotional speech is still an open question. The principal issue is how to introduce expressivity without compromising the naturalness of the synthetic speech provided by the state-of-the-art technology. In this paper two concatenative synthesis systems are described and some approaches to address this topic are proposed. For example, considering the intrinsic expressivity of certain speech acts, by exploiting the correlation between affective states and communicative functions, has proven an effective solution. This implies a different approach in the design of the speech databases as well as in the labelling and selection of the "expressive" units. In fact, beyond phonetic and prosodic criteria, linguistic and pragmatic aspects should also be considered. The management of units of different type (neutral vs expressive) is also an important issue.
Affective Speech Synthesis is quite important for various applications like storytelling, speech based user interfaces, computer games, etc. However, some studies revealed that Text-To-Speech (TTS) systems have tendency for not conveying a suitable emotional expressivity in their outputs. Due to the recent convergence of several analytical studies pertaining to affect and human speech, this problem can now be tackled by a new angle that has at its core an appropriate prosodic parameterization based on an intelligent detection of the affective clues of the input text. This, allied with recent findings on affective speech analysis, allows a suitable assignment of pitch accents, other prosodic parameters and signal properties that adhere to F0 and match the optimal parameterization for the emotion detected in the input text. Such approach allows the input text to be enriched with metainformation that assists efficiently the TTS system. Furthermore, the output of the TTS system is also postprocessed in order to enhance its affective content. Several preliminary tests confirm the validity of our approach and encourage us to continue its exploration.
Emotional Text to Speech Synthesis: A Review
IJARCCE, 2017
Several attempts have been done to add emotional effects to synthesized speech and several prototypes and fully operational systems have been built based on different synthesis techniques. Butfor Indian languages, there is still a lack of fully operational text to speech synthesis system with emotional effects. This paper aims to give an overview of what has been done in this field for some of the Indian Languages and highlights different issues faced during the development.
Acoustic correlates of emotion dimensions in view of speech synthesis
2001
In a database of emotional speech, dimensional descriptions of emotional states have been correlated with acoustic variables. Many stable correlations have been found. The predictions made by linear regression widely agree with the literature. The numerical form of the description and the choice of acoustic variables studied are particularly well suited for future implementation in a speech synthesis system, possibly allowing for the expression of gradual emotional states.
2013
This paper describes the methodology used for validating the results obtained in a study about acoustical modelling of emotional expression in Castilian Spanish. We have obtained a set of rules that describes the behaviour of the most significant parameters of speech related with the emotional expression. The validation of the results of the study has been achieved by the use of synthetic speech that has been generated following the different rules we have obtained for each emotion. 1.