Enhncement Influence of the Aperiodicity Coefficients in Speech Synthesis (original) (raw)

Modification of the aperiodic component of speech signals for synthesis

1996

Modification of the Aperiodic Component of Speech Signals for Synthesis Gael Richard Christophe R. d'Alessandro ABSTRACT Modeling the excitation component of speech signals is a challenging problem for speech synthesis. Recently, several works have been devoted to peri-odic/ ...

Analysis/synthesis and modification of the speech aperiodic component

Speech Communication, 1996

The general framework of this paper is speech analysis and synthesis. The speech signal may be separated into two components: (1) a periodic component (which includes the quasi-periodic or voiced sounds produced by regular vocal cord vibrations); (2) an aperiodic component (which includes the non-periodic part of voiced sounds (e.g. fricative noise in /v/j or sound emitted without any vocal cord vibration (e.g. unvoiced fricatives, or plosives)). This work is intended to contribute to a precise modelling of this second component and particularly of modulated noises. Firstly, a synthesis method, inspired by the "shot noise effect", is introduced. This technique uses random point processes which define the times of arrival of spectral events (represented by Formant Wave Form (FWF)). Based on the theoretical framework provided by the Rice representation and the random modulation theory, an analysis/synthesis scheme is proposed. Perception tests show that this method allows to synthesize very natural speech signals. The representation proposed also brings new types of voice quality modifications (time scaling, vocal effort, breathiness of a voice, etc.).

Speech synthesis at the Institute of Phonetics

Annual Report of the Institute of Phonetics University of Copenhagen

This paper gives a brief description of the more important research involving speech synthesis at the Institute of Phonetics since the Institute was founded in 1966. Further, it provides a status report for ongoing research in the shape of a more detailed account of our current activities and future plans.

Prediction of voice aperiodicity based on spectral representations in HMM speech synthesis

2011

Abstract In hidden Markov model-based speech synthesis, speech is typically parameterized using source-filter decomposition. A widely used analysis/synthesis framework, STRAIGHT, decomposes the speech waveform into a framewise spectral envelope and a mixed mode excitation signal. Inclusion of an aperiodicity measure in the model enables synthesis also for signals that are not purely voiced or unvoiced.

Simulation of Human Speech Production Applied to the Study and Synthesis of European Portuguese

EURASIP Journal on …, 2005

A new articulatory synthesizer (SAPWindows), with a modular and flexible design, is described. A comprehensive acoustic model and a new interactive glottal source were implemented. Perceptual tests and simulations made possible by the synthesizer contributed to deepening our knowledge of one of the most important characteristics of European Portuguese, the nasal vowels. First attempts at incorporating models of frication into the articulatory synthesizer are presented, demonstrating the potential of performing fricative synthesis based on broad articulatory configurations. Synthesis of nonsense words and Portuguese words with vowels and nasal consonants is also shown. Despite not being capable of competing with mainstream concatenative speech synthesis, the anthropomorphic approach to speech synthesis, known as articulatory synthesis, proved to be a valuable tool for phonetics research and teaching. This was particularly true for the European Portuguese nasal vowels.

Speech synthesis, speech simulation and speech science

Speech synthesis research has been transformed in recent years through the exploitation of speech corpora -both for statistical modelling and as a source of signals for concatenative synthesis. This revolution in methodology and the new techniques it brings calls into question the received wisdom that better computer voice output will come from a better understanding of how humans produce speech. This paper discusses the relationship between this new technology of simulated speech and the traditional aims of speech science. The paper suggests that the goal of speech simulation frees engineers from inadequate linguistic and physiological descriptions of speech. But at the same time, it leaves speech scientists free to return to their proper goal of building a computational model of human speech production.

A model of segmental duration for speech synthesis in French

Speech Communication, 1987

Abslracl. This paper presents a set of rules l<~ predict phtmcnlc durations for synthesis apl'~lications in French. 'lhc rules use a spcakcr-indcl~cndcnt Intrinsic Duration for each phoneme and a Icngthcning/shortcJ:ing eocflicicnt rcflceling the off'eels of context and speaking style. The model can thus yield diffcu'ent sets of phoneme tlur:~tions a.~ produced hv ,.liflcrcnt speakers. The valitlitv of the motlcl was tcMed (Ill 2 speakers. For the Ic~,t-eorp~ra, tile nic;|ll ~liflerences hctwc:en prctlietc([ alld measured dlll+ltion~ were less lh;in 18 nls. Zusammenfassung. Vorgcslclit wird tin Rcgcisystcm zur Stcucrung dcr Phoncnldaucr bci dcr Svnthcsc tics Franz+isisehcn. Dic Rcgcln nutzcn ffir jcdes Phoncm cine sprcehcrunabhfingigc intrinsisehc l)autr sowie cincll ;.itl,, Kt;ntcxt Ulld Splt'chx~ci,,c ahgclcitctcn l+fingtmgs-und Kfirzungs-Kocffizicntcn. Auf dicsc Wcisc kann das Modcll, dcsscn Lcistung an zwei Sprcchern gctcstct wurdc, dic sprcchcrspczifischcn scgmcntalcn Daucrvcrhfiltnissc ausgczcichnct nachbilden. In den Test-Korpora hctrug dic mittlerc Ahwcichung dcr real gcmcsscncn l)aucrwcrte vtm den vorhergcsagtcn wenigcr als !,~ ms. ReSlillle. On prescntc iei tm cnscnlblc tic roglcs pcrmcttant tie Iournir ,I ties %vstt'lllCS tie svntlff.'sc ti!L I lran~'ais le', thH¢c,,, des phon~.'mcs cn contcxtc. Los r0glcs utiliscnt, pour chaquc phone:me, des durees intrinsoqtcs qui sont considerces c,mlmc tic.', tlt)nnOes ind6pcndantcs du h)cutcur. ('cs tlurt3es strut cnsuitc mt)dul6es par ties coeflicient~ d'alh)ngement ou tic r~lccourcis,,cmcnt qui rcflctcnt Its influences du contcxtc ct du mode d'61oeution. Ain,;i. cot ensemble de rcgle,, pcrmet tic prcdirc ties dur6cs scgrncntalcs diff6rcntcs pour imitcr au micux ties Ioeuteurs ~ariOs. l.,i valitlil6 tlu motlclc a 6tO te~tct ,,ur 2 Iocuteurs. Sur Its corpus tie text utiliscs, Its ecaz-ts nlloycns entre tlur('cs prctlitcs ct mcsur('cs strut infcrieurs it [8 m,,.

ON THE RELATIVE IMPORTANCE OF DIFFERENT PROSODIC FACTORS FOR IMPROVING SPEECH SYNTHESIS

We present results of perceptual experiments geared toward assessing the relative importance of several prosodic factors in synthetic speech, showing that naturalness, relative to a target speaking style, can be significantly improved through both symbolic label prediction and better F0 and duration generation. Our experiments utilized a novel perceptual experiment paradigm, where we supply each test subject with two reference utterances in order to obtain reliable absolute scores that indicate magnitude of improvement. The approach gives ratings that are comparable across experiments. Results also show a strong interaction between detailed F0 and duration controls.