Emotional Speech Datasets for English Speech Synthesis Purpose : A Review (original) (raw)

The Emotional Voices Database: Towards Controlling the Emotion Dimension in Voice Generation Systems

ArXiv, 2018

In this paper, we present a database of emotional speech intended to be open-sourced and used for synthesis and generation purpose. It contains data for male and female actors in English and a male actor in French. The database covers 5 emotion classes so it could be suitable to build synthesis and voice transformation systems with the potential to control the emotional dimension in a continuous way. We show the data's efficiency by building a simple MLP system converting neutral to angry speech style and evaluate it via a CMOS perception test. Even though the system is a very simple one, the test show the efficiency of the data which is promising for future work.

Emotional speech synthesis: Applications, history and possible future

Proc. ESSV, 2009

Emotional speech synthesis is an important part of the puzzle on the long way to human-like artificial human-machine interaction. During the way, lots of stations like emotional audio messages or believable characters in gaming will be reached. This paper discusses technical aspects of emotional speech synthesis, shows practical applications based on a higher level framework and highlights new developments concerning the realization of affective speech with non-uniform unit selection based synthesis and voice transformation techniques.

Voice quality interpolation for emotional text-to-speech synthesis

2005

Synthesizing desired emotions using concatenative algorithms relies on collection of large databases. This paper focuses on the development and assessment of a simple algorithm to interpolate the intended vocal effort in existing databases in order to create new databases with intermediate levels of vocal effort. Three diphone databases in German with soft, modal, and loud voice qualities are processed with a spectral interpolation algorithm. A listening test is performed to evaluate the intended vocal effort in the original databases as well as the interpolated ones. The results show that the interpolation algorithm can create the intended intermediate levels of vocal effort given the original databases independent of the language background of the subjects.

Synthesis of Speech with Emotions

Proc. International Conference on Communication, Computers and Devices

This paper describes the methodology proposed by us for synthesizing speech with emotion. Our work starts with the pitch synchronous analysis of single phoneme utterances with natural emotion to obtain the linear prediction (LP) parameters. For synthesizing speech with emotion, we modify the pitch contour of a normal utterance of a single phoneme. We subsequently filter this signal using the LP parameters. The proposed technique can be used to improve the naturalness of voice in a text-to-speech system.

Speech synthesis and emotions: a compromise between flexibility and believability

2008

The synthesis of emotional speech is still an open question. The principal issue is how to introduce expressivity without compromising the naturalness of the synthetic speech provided by the state-of-the-art technology. In this paper two concatenative synthesis systems are described and some approaches to address this topic are proposed. For example, considering the intrinsic expressivity of certain speech acts, by exploiting the correlation between affective states and communicative functions, has proven an effective solution. This implies a different approach in the design of the speech databases as well as in the labelling and selection of the "expressive" units. In fact, beyond phonetic and prosodic criteria, linguistic and pragmatic aspects should also be considered. The management of units of different type (neutral vs expressive) is also an important issue.

Development of an emotional speech synthesiser in Spanish

1999

Currently, an essential point in speech synthesis is the addressing of the variability of human speech. One of the main sources of this diversity is the emotional state of the speaker. Most of the recent work in this area has been focused on the prosodic aspects of speech and on rule-based formantsynthesis experiments. Even when adopting an improved voice source, we cannot achieve a smiling happy voice or the menacing quality of cold anger. For this reason, we have performed two experiments aimed at developing a concatenative emotional synthesiser, a synthesiser that can copy the quality of an emotional voice without an explicit mathematical model.

Emotion Identification for Evaluation of Synthesized Emotional Speech

2018

In this paper, we propose to evaluate the quality of emotional speech synthesis by means of an automatic emotion identification system. We test this approach using five different parametric speech synthesis systems, ranging from plain non-emotional synthesis to full re-synthesis of pre-recorded speech. We compare the results achieved with the automatic system to those of human perception tests. While preliminary, our results indicate that automatic emotion identification can be used to assess the quality of emotional speech synthesis, potentially replacing time consuming and expensive human perception tests

Acoustic correlates of emotion dimensions in view of speech synthesis

2001

In a database of emotional speech, dimensional descriptions of emotional states have been correlated with acoustic variables. Many stable correlations have been found. The predictions made by linear regression widely agree with the literature. The numerical form of the description and the choice of acoustic variables studied are particularly well suited for future implementation in a speech synthesis system, possibly allowing for the expression of gradual emotional states.

Analysis of statistical parametric and unit selection speech synthesis systems applied to emotional speech

Speech Communication, 2010

We have applied two state-of-the-art speech synthesis techniques (unit selection and HMM-based synthesis) to the synthesis of emotional speech. A series of carefully designed perceptual tests to evaluate speech quality, emotion identification rates and emotional strength were used for the six emotions which we recordedhappiness, sadness, anger, surprise, fear, disgust. For the HMM-based method, we evaluated spectral and source components separately and identified which components contribute to which emotion.