Analysis and synthesis of Cantonese F/sub 0/ contours based on the command-response model (original) (raw)
Related papers
Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005., 2005
Cantonese is a well-known Chinese dialect with a quite complex tone system. We have successfully applied the commandresponse model to represent F 0 contours of Cantonese speech by defining a set of appropriate tone command patterns. It provides an efficient means to describe Cantonese F 0 contours with high accuracy. In this paper, both qualitative and quantitative descriptions of the command patterns of nine tones are presented. The various cues for identifying each tone are investigated, based on which a set of rules is derived to synthesize F 0 contours of Cantonese. Perceptual experiments are also conducted to test the validity of the rules for each tone and to evaluate the naturalness of synthetic speech based on those rules.
Analysis of F0 contours of Cantonese utterances based on the command-response model
As a major Chinese dialect, Cantonese is well known for its complex tone system. This paper applies the commandresponse model to represent the F 0 contours of Cantonese speech. Analysis-by-Synthesis is conducted on both utterances of carrier sentences and utterances with less constrained structures, from which a set of appropriate tone command patterns is derived. By intrinsically incorporating the effects of tone coarticulation, word accentuation and phrase intonation, the model provides a high accuracy of approximation to the F 0 contours of Cantonese, and hence serves as a much better method to quantitatively describe the continuous F 0 contours than the traditional tone letter scale notation system. The constraints in timing and amplitude of tone commands are also investigated, which can be used for synthesis of F 0 contours.
2004
The model for the process of F 0 contour generation, first proposed by Fujisaki and his coworkers, has been successfully applied to Mandarin, which is a typical tone language with a distinct feature that both positive and negative tone commands are required. However, the inverse problem, viz., automatic derivation of the model parameters from an observed F 0 contour, is more difficult for Mandarin than for those non-tone languages, because the polarity of tone commands cannot be inferred directly from the F 0 contour itself. In this paper, an efficient method is proposed to solve the problem by using the information on syllable timing and tone labels. With the same framework as that proposed for Japanese and English, the method presented here for Mandarin is focused on the firstorder estimation of tone command parameters. A set of intrasyllable and inter-syllable rules are constructed to recognize the tone command patterns within each syllable. The experiment shows that the method works effectively and gives results comparable to those obtained by manual analysis.
Ieice Transactions on Information and Systems, 2004
The model for the process of F 0 contour generation, first proposed by Fujisaki and his coworkers, has been successfully applied to Standard Chinese, which is a typical tone language with a distinct feature that both positive and negative tone commands are required. However, the inverse problem, viz., automatic derivation of the model parameters from an observed F 0 contour of speech, cannot be solved analytically. Moreover, the extraction of model parameters for Standard Chinese is more difficult than for Japanese and English, because the polarity of tone commands cannot be inferred directly from the F 0 contour itself. In this paper, an efficient method is proposed to solve the problem by using information on syllable timing and tone labels. With the same framework as for the successive approximation method proposed for Japanese and English, the method presented here for Standard Chinese is focused on the first-order estimation of tone command parameters. A set of intra-syllable and inter-syllable rules are constructed to recognize the tone command patterns within each syllable. The experiment shows that the method works effectively and gives results comparable to those obtained by manual analysis.
Speech Communication, 2012
A 2-step scheme was developed in our method for synthesizing sentence fundamental frequency (F 0) contours of Mandarin speech. The method is based on representing a sentence logarithmic F 0 contour as a superposition of tone components on phrase components as in the case of generation process model (F 0 model). The tone components are realized by concatenating tone nucleus F 0 patterns generated by a corpus-based method, while the phrase components are generated by rules under the F 0 model framework. In the 2-step scheme, the phrase components are first generated and their information is added to the inputs for the prediction of tone nucleus F 0 patterns. Result of listening tests on synthetic speech with the synthesized F 0 contours verified the validity of the developed scheme. For comparison, we also generated F 0 contours without decomposing them into tone and phrase components as most existing methods did. Although from the viewpoint of naturalness of synthetic speech, the result did not show clear advantage of the proposed method, from the viewpoint of flexibility the advantage came clear: by manipulating phrase components in the proposed method, a better focus control was realized.
Analysis of Shanghainese F0 contours based on the command-response model
As one of the major Chinese dialects, Shanghainese is well known for its complex tone sandhi system. This paper applies the command-response model to represent F0 contours of Shanghainese speech. Analysis-by-synthesis is conducted both on carrier sentences with monosyllabic target words and on isolated polysyllabic words, from which a set of appropriate tone command patterns is derived for words of different lengths and different initial citation tones. By incorporating the effects of tone coarticulation, word accentuation and phrase intonation, the model gives high accuracy of approximations to F0 contours of Shanghainese utterances, and hence provides a more efficient means to quantitatively represent F0 contours and to describe the tone sandhi system of Shanghainese than the traditional 5-level tone code system.
Speech Communication, 2005
While the tonal characteristics of Chinese syllables have been qualitatively described in traditional phonetics, quantitative analysis requires a mathematical model. This paper presents such a model for the fundamental frequency contours of Standard Chinese, based on an extension of a model that has already been proved to be applicable to non-tone languages including Japanese, English, and others. The model allows one to interpret a given fundamental frequency contour in terms of tone commands and phrase commands, and to analyze various tonal phenomena in quantitative terms. The paper then describes the results of analysis of fundamental frequency contours of a number of utterances, revealing systematic relationships between the timing of the tone commands and the final of each syllable. The results are used to derive constraints for tone and phrase command generation in speech synthesis. The validity of the rules is confirmed by evaluating the naturalness of prosody of synthetic speech. The validity of introducing these constraints in speech synthesis of Standard Chinese is confirmed by perceptual tests on naturalness of prosody as well as on intelligibility of tones, using speech synthesized with and without these constraints.
Archives of Acoustics, 2007
A method for generating sentence F0 contours of Standard Chinese speech is developed. It is based on superposing tone components on phrase components in logarithmic frequency. While tone components are language specific, phrase components are assumed to be more language universal. Taking this situation into account, the method treats two kinds of components differently. The tone components are generated by concatenating F0 patterns of tone nuclei, which are predicted by a corpus-based scheme, while the phrase components are generated by rules. Experiments on F0 contour generation were conducted using 100 news utterances by a female speaker. First experiments were conducted on the generation of tone components, with phrase components of the original utterances being used unchanged. The results showed that the method could generate F0 contours close to those of target speech. Speech synthesis was conducted by substituting original F0 contours to generated ones by TD-PSOLA. A high score 4.5 in 5-point scale was obtained on average as the result of listening experiments on the quality of synthetic speech. Second experiments were on the generated phrase components, with the tone components extracted from the original utterances. Although the synthetic speech with generated F0 contours sounded mostly natural, there were occasional "degraded sounds", because of mismatch between the phrase and the tone components. To cope with the mismatch, a two-step method was developed, where information of the phrase contours was used for the prediction of tone components. Validity on the method was shown through perceptual experiments on synthesized speech.
Although the command-response model for the process of F 0 contour generation has been successfully applied to many languages, the inverse problem, viz., automatic derivation of the model parameters from an observed F 0 contour, is more challenging, especially for tone languages which have tone commands of both polarities. Since the polarities of tone commands cannot be inferred directly from the F 0 contour itself, the information on tone identity and timing need to be incorporated. The current study gives a general approach for the first-order estimation of tone command parameters for tone languages, taking Mandarin and Cantonese as two examples. After a rule-based recognition of the tone command patterns within each syllable, the timing and amplitude of tone commands will be deduced. The experiments show that the method gives good results of analysis for both the two dialects.
Automatic Extraction of Tone Command Parameters for the Model of
The model for the process of F 0 contour generation, first proposed by Fujisaki and his coworkers, has been successfully applied to Standard Chinese, which is a typical tone language with a distinct feature that both positive and negative tone commands are required. However, the inverse problem, viz., automatic derivation of the model parameters from an observed F 0 contour of speech, cannot be solved analytically. Moreover, the extraction of model parameters for Standard Chinese is more difficult than for Japanese and English, because the polarity of tone commands cannot be inferred directly from the F 0 contour itself. In this paper, an efficient method is proposed to solve the problem by using information on syllable timing and tone labels. With the same framework as for the successive approximation method proposed for Japanese and English, the method presented here for Standard Chinese is focused on the first-order estimation of tone command parameters. A set of intra-syllable and inter-syllable rules are constructed to recognize the tone command patterns within each syllable. The experiment shows that the method works effectively and gives results comparable to those obtained by manual analysis.