Explicit modeling of vowel coarticulation in continuous speech recognition (original) (raw)

International Conference on Acoustics, Speech, and Signal Processing, 2003

Abstract

An ongoing study is reported of all sixteen of the American English vowels using subsets of the DARPA acoustic-phonetic database. Formants are obtained and normalized for each talker's formant range based on one sentence. The resulting formant tracks are smoothed using splines and sampled at nine equally spaced points in time within vowel-centered triphone regions. Triphones with semivowels in them are clustered separately. These formant values are k-means clustered using subsets of the sampled formant values. The additional supervised training is done using other parameters, including duration. The resulting clusters are used as a classifier on the basis of the modified Euclidean distance from the cluster centers. This results in approximately 80% first choice vowel recognition of the outer edges of the vowel quadrilateral. Stressed vowels were found to have spectra which statistically were no more stable than unstressed vowels.<<ETX>>

Jim Hieronymus hasn't uploaded this paper.

Let Jim know you want this paper to be uploaded.

Ask for this paper to be uploaded.