Matthias Jilka | University of Augsburg (original) (raw)
Papers by Matthias Jilka
This paper looks at improving unit selection text-to-speech (TTS) quality by optimizing the agree... more This paper looks at improving unit selection text-to-speech (TTS) quality by optimizing the agreement between front- end and speech database. We focused, in particular, on two classes of problems causing degradation in synthesis quality: 1) realization of /d/ and /t/1 sounds and 2) confusions of unstressed vowels, especially with schwas. We investigated two approaches to tackling these prob- lems. First,
This paper presents a comparative study on the temporal alignment of pitch peaks of H*L accents i... more This paper presents a comparative study on the temporal alignment of pitch peaks of H*L accents in Polish and German. Speech material used in the study came from the unit selection synthesis corpora of the Polish voice module of the BOSS system and the IMS German Festival TTS system. The major factors investigated were concerned with the influence of syllable
This study attempts to determine whether natural prosody variations and different methods of appl... more This study attempts to determine whether natural prosody variations and different methods of applying prosodic patterns are relevant to listeners' perceptions of synthetic speech quality. The prosodic patterns of five test sentences including Yes- No -questions, Wh-questions, declaratives, and continuation rises as produced by six female native speakers of four varieties of English were imposed on the same US English
NeuroImage, 2009
246 SA-PM The signature of the human syntactic architecture, M Musso, V Glauche, A Horn,
Proceedings of 2002 IEEE Workshop on Speech Synthesis, 2002., 2002
The quality of speech synthesis has come a long way since Homer Dudley's "Voder"in 1939. In fact,... more The quality of speech synthesis has come a long way since Homer Dudley's "Voder"in 1939. In fact, with the widespread use of unit-selection synthesizers, the naturalness of the synthesized speech is now high enough to pass the Turing test for short utterances, such as voice prompts. Therefore, it seems valid to ask the question "what are the next challenges for TTS Research?" This paper tries to identify unsolved issues, the solution of which would greatly enhance the state of the art in TTS.
This paper aims to demonstrate the use of a unit selection corpus, the IMS German Festival synthe... more This paper aims to demonstrate the use of a unit selection corpus, the IMS German Festival synthesis system (1), in carrying out a comprehensive investigation of factors influencing specific aspects of the phonetic realization of tonal categories. The study restricts itself to the alignment of peaks in H*L pitch accents in German. First results confirm not only well-known effects of
This paper describes the use of a unit selection corpus in carrying out an investigation of facto... more This paper describes the use of a unit selection corpus in carrying out an investigation of factors influencing specific aspects of the phonetic realization of tonal categories, concentrating on the alignment of peaks in H*L pitch accents in German. Three major linguistic parameters potentially influencing peak alignment are investigated. Two of them (syllable structure, nuclear pitch accents) are established influences
This study continues an approach that uses a unit selection corpus in order to investigate aspect... more This study continues an approach that uses a unit selection corpus in order to investigate aspects of the phonetic realization of tonal categories. The focus lies on the peak position of German H*L pitch accents, specifically on the question of whether it is influenced by vowel quality. It is confirmed that vowel backness does not affect peak alignment at all. The distinction between tense and lax vowels initially promises to be relevant, as the H*L peaks seemingly occur significantly earlier in lax vowels. The effect is however demonstrated to be caused by the far greater number of lax vowels in the closed syllables found in the corpus. Finally, the feature of vowel height is revealed to be a significant factor (peaks are aligned latest in high vowels, earliest in low vowels). Various parameters (e.g., syllable structure, position in the phrase) are examined for interactions, but cannot account for the effect. While vowel height correlates with vowel duration, vowel duration itself...
Speech Communication, 1999
This study presents an approach to the generation of American English intonation based on prescri... more This study presents an approach to the generation of American English intonation based on prescriptive rules that de®ne the respective features of certain tone labels that in turn represent linguistically relevant F 0 con®gurations. In accordance with the principles of the Tone Sequence Model the F 0 contour is analyzed as a series of discrete target values that are connected by means of transitional functions. The target values are associated either with stressed syllables (pitch accents) or the margins of the phrase (phrasal tones). The targets' exact position is represented relative to pitch range and time. All tone labels are examined according to these parameters and the results are then converted into a set of rules that allows the generation of an F 0 contour. Tones and Break Indices (ToBI), a system for transcribing the intonation patterns of American English, provides an inventory of tone labels and a set of example utterances available for analysis. 2 Utterances from ToBI and the Boston Radio News Corpus were used for the evaluation of the generation rules: root mean squared error (RMSE) and correlation between generated and original contour were determined, and in a perception test native speakers assessed the quality of the resynthesized contours which, in general, were judged to sound natural and show few dierences to the corresponding originals. Ó 1999 Elsevier Science B.V. All rights reserved.
The Journal of the Acoustical Society of America, 2009
The Journal of the Acoustical Society of America, 1999
The Journal of the Acoustical Society of America, 2003
Several generally accepted intonational features of questions in American English have not been t... more Several generally accepted intonational features of questions in American English have not been the subject of much empirical study: namely that wh-questions end in L-L% phrasal accents, and that their intonational contours are identical to those of declarative sentences, while yes/no questions end in H-H% phrasal accents. The study addresses the following questions about question intonation: How frequently do yes/no
The Journal of the Acoustical Society of America, 2004
In a previous study exploring American English question intonation, we found that some speakers d... more In a previous study exploring American English question intonation, we found that some speakers deviated considerably from expected question prosody. In this study, we focus on listener-rated acceptability of the various prosodic patterns observed for yes/no and wh questions. A variety of intonational patterns realized in both question utterances recorded from five female and three male professional speakers and in questions synthesized from several TTS voices of both genders was presented to listeners. Subjects judged the acceptability of each utterance in the context of a dialogue between a travel agent and customer. We hypothesized that question utterances with the expected intonational features (phrase-final fall in wh questions, phrase-final rise in yes/no questions) would be rated as more acceptable than question utterances with deviating intonational features, and that this result would hold for both natural and synthetic speech conditions. In addition, following our previous results, we hypothesized that the unexpected intonation pattern of phrase-final falls for yes/no questions would be more acceptable for lower-pitched than for higher-pitched voices. We also varied the prominence of the interrogative pronouns in synthetic wh questions in order to see whether simulating their high intonational prominence in natural wh questions improved the acceptability of synthetic wh questions.
Seventh International Conference on Spoken …, 2002
... in which two separate vowels are pronounced.1 Also, small-scale errors in the letter-to-phone... more ... in which two separate vowels are pronounced.1 Also, small-scale errors in the letter-to-phoneme conversion rules have to be avoided, such as, eg, the pronunciation of the letter sequence "sph" at the beginning of a morpheme (eg "Sphinx") as /sf/ and not /Sph/ due to lack of an ...
Proceedings of the …, 2007
INTRODUCING A COMPREHENSIVE APPROACH TO ASSESSING ... In fact, a special difficulty of pronunciat... more INTRODUCING A COMPREHENSIVE APPROACH TO ASSESSING ... In fact, a special difficulty of pronunciation acquisition as opposed to that of other aspects of grammar is ... The testing procedure is expected to reliably express the talent levels of the speakers, facilitating the ...
ifla.uni-stuttgart.de
This paper gives an overview of an ongoing Ph.D. project in cooperation with the DFG supported pr... more This paper gives an overview of an ongoing Ph.D. project in cooperation with the DFG supported project Language talent and brain activity at the Universities of Stuttgart and Tübingen. Considering the socio-psychological background of communication accommodation theory and the ...
This paper looks at improving unit selection text-to-speech (TTS) quality by optimizing the agree... more This paper looks at improving unit selection text-to-speech (TTS) quality by optimizing the agreement between front- end and speech database. We focused, in particular, on two classes of problems causing degradation in synthesis quality: 1) realization of /d/ and /t/1 sounds and 2) confusions of unstressed vowels, especially with schwas. We investigated two approaches to tackling these prob- lems. First,
This paper presents a comparative study on the temporal alignment of pitch peaks of H*L accents i... more This paper presents a comparative study on the temporal alignment of pitch peaks of H*L accents in Polish and German. Speech material used in the study came from the unit selection synthesis corpora of the Polish voice module of the BOSS system and the IMS German Festival TTS system. The major factors investigated were concerned with the influence of syllable
This study attempts to determine whether natural prosody variations and different methods of appl... more This study attempts to determine whether natural prosody variations and different methods of applying prosodic patterns are relevant to listeners' perceptions of synthetic speech quality. The prosodic patterns of five test sentences including Yes- No -questions, Wh-questions, declaratives, and continuation rises as produced by six female native speakers of four varieties of English were imposed on the same US English
NeuroImage, 2009
246 SA-PM The signature of the human syntactic architecture, M Musso, V Glauche, A Horn,
Proceedings of 2002 IEEE Workshop on Speech Synthesis, 2002., 2002
The quality of speech synthesis has come a long way since Homer Dudley's "Voder"in 1939. In fact,... more The quality of speech synthesis has come a long way since Homer Dudley's "Voder"in 1939. In fact, with the widespread use of unit-selection synthesizers, the naturalness of the synthesized speech is now high enough to pass the Turing test for short utterances, such as voice prompts. Therefore, it seems valid to ask the question "what are the next challenges for TTS Research?" This paper tries to identify unsolved issues, the solution of which would greatly enhance the state of the art in TTS.
This paper aims to demonstrate the use of a unit selection corpus, the IMS German Festival synthe... more This paper aims to demonstrate the use of a unit selection corpus, the IMS German Festival synthesis system (1), in carrying out a comprehensive investigation of factors influencing specific aspects of the phonetic realization of tonal categories. The study restricts itself to the alignment of peaks in H*L pitch accents in German. First results confirm not only well-known effects of
This paper describes the use of a unit selection corpus in carrying out an investigation of facto... more This paper describes the use of a unit selection corpus in carrying out an investigation of factors influencing specific aspects of the phonetic realization of tonal categories, concentrating on the alignment of peaks in H*L pitch accents in German. Three major linguistic parameters potentially influencing peak alignment are investigated. Two of them (syllable structure, nuclear pitch accents) are established influences
This study continues an approach that uses a unit selection corpus in order to investigate aspect... more This study continues an approach that uses a unit selection corpus in order to investigate aspects of the phonetic realization of tonal categories. The focus lies on the peak position of German H*L pitch accents, specifically on the question of whether it is influenced by vowel quality. It is confirmed that vowel backness does not affect peak alignment at all. The distinction between tense and lax vowels initially promises to be relevant, as the H*L peaks seemingly occur significantly earlier in lax vowels. The effect is however demonstrated to be caused by the far greater number of lax vowels in the closed syllables found in the corpus. Finally, the feature of vowel height is revealed to be a significant factor (peaks are aligned latest in high vowels, earliest in low vowels). Various parameters (e.g., syllable structure, position in the phrase) are examined for interactions, but cannot account for the effect. While vowel height correlates with vowel duration, vowel duration itself...
Speech Communication, 1999
This study presents an approach to the generation of American English intonation based on prescri... more This study presents an approach to the generation of American English intonation based on prescriptive rules that de®ne the respective features of certain tone labels that in turn represent linguistically relevant F 0 con®gurations. In accordance with the principles of the Tone Sequence Model the F 0 contour is analyzed as a series of discrete target values that are connected by means of transitional functions. The target values are associated either with stressed syllables (pitch accents) or the margins of the phrase (phrasal tones). The targets' exact position is represented relative to pitch range and time. All tone labels are examined according to these parameters and the results are then converted into a set of rules that allows the generation of an F 0 contour. Tones and Break Indices (ToBI), a system for transcribing the intonation patterns of American English, provides an inventory of tone labels and a set of example utterances available for analysis. 2 Utterances from ToBI and the Boston Radio News Corpus were used for the evaluation of the generation rules: root mean squared error (RMSE) and correlation between generated and original contour were determined, and in a perception test native speakers assessed the quality of the resynthesized contours which, in general, were judged to sound natural and show few dierences to the corresponding originals. Ó 1999 Elsevier Science B.V. All rights reserved.
The Journal of the Acoustical Society of America, 2009
The Journal of the Acoustical Society of America, 1999
The Journal of the Acoustical Society of America, 2003
Several generally accepted intonational features of questions in American English have not been t... more Several generally accepted intonational features of questions in American English have not been the subject of much empirical study: namely that wh-questions end in L-L% phrasal accents, and that their intonational contours are identical to those of declarative sentences, while yes/no questions end in H-H% phrasal accents. The study addresses the following questions about question intonation: How frequently do yes/no
The Journal of the Acoustical Society of America, 2004
In a previous study exploring American English question intonation, we found that some speakers d... more In a previous study exploring American English question intonation, we found that some speakers deviated considerably from expected question prosody. In this study, we focus on listener-rated acceptability of the various prosodic patterns observed for yes/no and wh questions. A variety of intonational patterns realized in both question utterances recorded from five female and three male professional speakers and in questions synthesized from several TTS voices of both genders was presented to listeners. Subjects judged the acceptability of each utterance in the context of a dialogue between a travel agent and customer. We hypothesized that question utterances with the expected intonational features (phrase-final fall in wh questions, phrase-final rise in yes/no questions) would be rated as more acceptable than question utterances with deviating intonational features, and that this result would hold for both natural and synthetic speech conditions. In addition, following our previous results, we hypothesized that the unexpected intonation pattern of phrase-final falls for yes/no questions would be more acceptable for lower-pitched than for higher-pitched voices. We also varied the prominence of the interrogative pronouns in synthetic wh questions in order to see whether simulating their high intonational prominence in natural wh questions improved the acceptability of synthetic wh questions.
Seventh International Conference on Spoken …, 2002
... in which two separate vowels are pronounced.1 Also, small-scale errors in the letter-to-phone... more ... in which two separate vowels are pronounced.1 Also, small-scale errors in the letter-to-phoneme conversion rules have to be avoided, such as, eg, the pronunciation of the letter sequence "sph" at the beginning of a morpheme (eg "Sphinx") as /sf/ and not /Sph/ due to lack of an ...
Proceedings of the …, 2007
INTRODUCING A COMPREHENSIVE APPROACH TO ASSESSING ... In fact, a special difficulty of pronunciat... more INTRODUCING A COMPREHENSIVE APPROACH TO ASSESSING ... In fact, a special difficulty of pronunciation acquisition as opposed to that of other aspects of grammar is ... The testing procedure is expected to reliably express the talent levels of the speakers, facilitating the ...
ifla.uni-stuttgart.de
This paper gives an overview of an ongoing Ph.D. project in cooperation with the DFG supported pr... more This paper gives an overview of an ongoing Ph.D. project in cooperation with the DFG supported project Language talent and brain activity at the Universities of Stuttgart and Tübingen. Considering the socio-psychological background of communication accommodation theory and the ...