Integrating Rule and Template-based Approaches for Emotional Malay Speech Synthesis (original) (raw)
Related papers
2008
This paper presents a hybrid technique to enhance the quality of the rule-based approach to generate prosody for Malay speech synthesis by integrating prosody parametric manipulation with template parametric manipulation so as to increase the intonation variability of the synthesized output. Basically the prosodic features of the neutral synthesized speech are manipulated in an attempt to express the four basic emotions, namely happiness, anger, sadness and fear. We also present an objective methodology to evaluate the effectiveness of the synthesized output to generate the appropriate prosody in order to confirm the subjective perception tests.
Prosodic Analysis And Modelling For Malay Emotional Speech Synthesis
Malaysian Journal of Computer Science
This paper discusses an emotional prosody generator for a Malay speech synthesis system that can re-synthesize the selected vocal emotion from neutral synthesized speech output and improve the naturalness by adopting rulebased prosody conversion techniques. The role of prosodic features in emotional expression, particularly fundamental frequency and duration, has been widely investigated in several research projects. This project attempts to improve the naturalness of the synthesized emotional Malay speech by establishing an effective mechanism for the re-synthesis of emotion. Such a mechanism is created by analyzing the variation in the F0 contour of continuous emotional Malay speech against a fixed time period. The emotional prosodic generator for Malay developed in the course of this research makes use of principles of parametric prosody manipulation to synthesize four basic emotions, namely happiness, anger, sadness and fear. Subjective evaluation by means of listening tests was conducted to validate the ability of the emotions generator to generate the necessary prosody to synthesize emotional expression. The evaluation results show an overall recognition rate of between 61% and 85%.
Adding Emotions to Malay Synthesized Speech Using Diphone-based Templates
2005
This paper concerns the addition of an affective component to Fasih 1 , one of the first Malay Textto-Speech systems developed by MIMOS Berhad. The goal is to introduce a new method of incorporating emotions to Fasih by building an emotions filter that is template-driven. The templates are diphone-based emotional templates that can portray four types of emotions, i.e. anger, sadness, happiness and fear. A preliminary experiment that focused on showed that the recognition rate of Malay synthesized speech is over 60% for anger and sadness. Recognition Rate (%)
Emotion extractor: A methodology to implement prosody features in speech synthesis
Electronic Computer Technology …, 2010
This paper presents the methodology to extract emotion from the text at real time and add the expression to the documents contents during speech synthesis. To understand the existence of emotions self assessment test was carried out on set of documents and preliminary rules were formulated for three basic emotions: Pleasure, Arousal and Dominance. These rules are used in an automated procedure that assigns emotional state values to document contents. These values are then used by speech synthesizer to add emotions to speech. The system is language independent and content free.
Adding an Emotions Filter to Malay Text-to-Speech System
2007
In this paper we present the findings of our research which aims to develop an emotions filter that can be added to an existing Malay text-to-speech system to produce an output expressing happiness, anger, sadness and fear. The end goal is to produce an output that is as natural as possible, thus contributing towards the enhancement of the existing system. The emotions filter was developed by manipulating pitch and duration of the output using a rule-based approach. The data was made up of emotional sentences produced by a female native speaker of Malay. The information extracted from the analysis was used to develop the filter. The emotional speech output underwent several acceptance tests. The results showed that the emotions filter developed was compatible with FASIH and other TTS systems using the rule-based approach of prosodic manipulation. However, further work need to be done to enhance the naturalness of the output.
Synthesis of Emotional Speech by Prosody Modification of Vowel Segments of Neutral Speech
SSRN Electronic Journal, 2019
Speech is viewed as a combination of voiced and unvoiced regions. Voiced speech is produced due to vibration of the vocal cords. The vibrating pattern of vocal cords is different in different emotions. During production of some consonant sound units, vocal cords do not vibrate. Therefore, consonants are less effective for emotion generation in speech signal. In this paper, we have considered only vowel regions for emotion synthesis using three prosody parameters duration, intensity and pitch patterns. Vowel like regions (VLR) is identified using vowel onset and offset points. Onset and offset points are starting and ending points of the vowel like regions. It is observed that during emotional synthesis from neutral speech mainly vowel regions of speech utterance are modified significantly. Our experimental result shows that the emotion synthesis using only prosody modification of VLR is significantly better than emotion synthesis of prosody modification at syllable level and it is also very effective in time consideration. The average mean opinion score is calculated using only vowel level prosody modification. The average mean opinion scores for angry, happy and fear emotional speeches are 3.85, 3.60 and 4.03, respectively. These mean opinion scores are better than syllable level prosody modification which are 3.56, 3.17 and 3.92 for angry, happy and fear emotions, respectively.
Emotional Text to Speech Synthesis: A Review
IJARCCE, 2017
Several attempts have been done to add emotional effects to synthesized speech and several prototypes and fully operational systems have been built based on different synthesis techniques. Butfor Indian languages, there is still a lack of fully operational text to speech synthesis system with emotional effects. This paper aims to give an overview of what has been done in this field for some of the Indian Languages and highlights different issues faced during the development.
Synthesis of Speech with Emotions
Proc. International Conference on Communication, Computers and Devices
This paper describes the methodology proposed by us for synthesizing speech with emotion. Our work starts with the pitch synchronous analysis of single phoneme utterances with natural emotion to obtain the linear prediction (LP) parameters. For synthesizing speech with emotion, we modify the pitch contour of a normal utterance of a single phoneme. We subsequently filter this signal using the LP parameters. The proposed technique can be used to improve the naturalness of voice in a text-to-speech system.
Emotional analysis for Malayalam Text to Speech Synthesis Systems
The inclusion of emotional aspects into speech can improve the naturalness of speech synthesis system. The different emotions -sadness, angry, happiness are manifested in speech as prosodic elements like time duration, pitch and intensity. The prosodic values corresponding to different emotions are analyzed at word as well as phonemic level, using speech analysis and manipulation tool PRAAT. This paper presents the emotional analysis of the prosodic features such as duration, pitch and intensity of Malayalam speech. The analysis shows that duration is generally least for anger and highest for sadness, where as intensity is highest for anger and least for sadness. A new prosodic feature called rise time/fall time which can capture both durational and intensity variation, is introduced. The pitch contour which is flat for neutral speech shows significant variation for different emotions. The detailed analysis considering the duration of different phonemes reveals that the duration var...
The Creation of Emotional Effects For An Arabic Speech Synthesis System
Having emotional effects in a speech synthesis system is an important requirement for many applications that require expressive synthesis styles. In this work we introduce the efforts done to build a unit selection based Arabic speech synthesis voice with emotional effects. Three emotional sates were covered; normal, sad and questions. Pitchmarking enhancements have been carried out for Arabic voice building for more accurate unit concatenation. The Expressive information was employed in the proposed target cost to produce more natural and emotive synthetic speech. The system is evaluated according to the naturalness and emotiveness of the produced speech. The system evaluations show significant increase in the naturalness and emotiveness scores.