Computer speech synthesis: its status and prospects (original) (raw)
Related papers
International journal of engineering research and technology, 2013
Attempts to control the quality of voice of synthesized speech have existed for more than a decade now. Several prototypes and fully operating systems have been built based on different synthesis technique. This article reviews recent research advances in R&D of speech synthesis with focus on one of the key approaches i.e. statistical parametric approach to speech synthesis based on HMM, so as to provide a technological perspective. In this approach, spectrum, excitation, and duration of speech are simultaneously modeled by context –dependent HMMs, and speech waveforms are generated from the HMMs themselves. This paper aims to give an overview of what has been done in this field, summarize and compare the characteristics of various synthesis techniques used. It is expected that this study shall be a contribution in the field of speech synthesis and enable identification of research topic and applications which are at the forefront of this exciting and challenging field.
A Comparative Study of Different Text-to- Speech Synthesis Techniques
Speech synthesis is the artificial production of human speech. Attempts to control the quality of voice of synthesized speech have existed for more than a decade now. Several prototypes and fully operating systems also have been built based on different synthesis technique. This article reviews recent advances in research and development of speech synthesis with focus on one of the key approaches i.e. statistical parametric approach to speech synthesis based on HMM, so as to provide a technological perspective. In this approach, spectrum, excitation, and duration of speech are simultaneously modeled by context dependent HMMs, and speech waveforms are generated from the HMMs themselves. This paper aims to give an overview of what has been done in this field, summarize and compare the characteristics of various speech synthesis techniques used.
Toward Spontaneous Speech Synthesis—Utilizing Language Model Information in TTS
IEEE Transactions on Speech and Audio Processing, 2004
State-of-the-art speech synthesis systems achieve a high overall quality. However, synthesized speech still lacks naturalness. To produce more natural and colloquial synthetic speech, our research focuses on integration of effects present in spontaneous speech. Conventional speech synthesis systems do not consider the probability of a word in its context. Recent investigations on corpora of natural speech showed that words that are very likely to occur in a given context are pronounced less accurately and faster than improbable ones. In this paper three approaches are introduced to model this effect found in spontaneous speech. The first algorithm changes the speaking rate directly by shortening or lengthening the syllables of a word depending on the language model probability of that word. Since probable words are not only pronounced faster but also less accurately this approach was extended by selecting appropriate pronunciation variants of a word according to the language model probability. This second algorithm changes the local speaking rate indirectly by controlling the grapheme-phoneme conversion. In a third stage, a pronunciation sequence model was used to select the appropriate variants according to their sequence probability. In listening experiments test participants were asked to rate the synthesized speech in the categories colloquial impression and naturalness. Our approaches achieved a significant improvement in the category colloquial impression. However, no significantly higher naturalness could be observed. The observed effects will be discussed in detail.
An overview of text-to-speech synthesis techniques
2010
The goal of this paper is to provide a short but comprehensive overview of text-to-speech synthesis by highlighting its natural language processing (NLP) and digital signal processing (DSP) components. First, the front-end or the NLP component comprised of text analysis, phonetic analysis, and prosodic analysis is introduced then two rule-based synthesis techniques (formant synthesis and articulatory synthesis) are explained. After that concatenative synthesis is explored. Compared to rulebased synthesis, concatenative synthesis is simpler since there is no need to determine speech production rules. However, concatenative synthesis introduces the challenges of prosodic modification to speech units and resolving discontinuities at unit boundaries. Prosodic modification results in artifacts in the speech that make the speech sound unnatural. Unit selection synthesis, which is a kind of concatenative synthesis, solves this problem by storing numerous instances of each unit with varying prosodies. The unit that best matches the target prosody is selected and concatenated. Finally, hidden Markov model (HMM) synthesis is introduced.
Speech synthesis at the Institute of Phonetics
Annual Report of the Institute of Phonetics University of Copenhagen
This paper gives a brief description of the more important research involving speech synthesis at the Institute of Phonetics since the Institute was founded in 1966. Further, it provides a status report for ongoing research in the shape of a more detailed account of our current activities and future plans.
The impact of speech recognition on speech synthesis
2002
Speech synthesis has changed dramatically in the past few years to have a corpus-based focus, borrowing heavily from advances in automatic speech recognition. In this paper, we survey technology in speech recognition systems and how it translates (or doesn't translate) to speech synthesis systems. We further speculate on future areas where ASR may impact synthesis and vice versa.
Speech synthesis, speech simulation and speech science
Speech synthesis research has been transformed in recent years through the exploitation of speech corpora -both for statistical modelling and as a source of signals for concatenative synthesis. This revolution in methodology and the new techniques it brings calls into question the received wisdom that better computer voice output will come from a better understanding of how humans produce speech. This paper discusses the relationship between this new technology of simulated speech and the traditional aims of speech science. The paper suggests that the goal of speech simulation frees engineers from inadequate linguistic and physiological descriptions of speech. But at the same time, it leaves speech scientists free to return to their proper goal of building a computational model of human speech production.