Three-Dimensional Modeling of Tongue During Speech Using Mri Data (original) (raw)

Three-dimensional linear articulatory modeling of tongue, lips and face, based on MRI and video images

Journal of Phonetics, 2002

In this study, previous articulatory midsagittal models of tongue and lips are extended to full three-dimensional models. The geometry of these vocal organs is measured on one subject uttering a corpus of sustained articulations in French. The 3D data are obtained from magnetic resonance imaging of the tongue, and from front and profile video images of the subject's face marked with small beads. The degrees of freedom of the articulators, i.e., the uncorrelated linear components needed to represent the 3D coordinates of these articulators, are extracted by linear component analysis from these data. In addition to a common jaw height parameter, the tongue is controlled by four parameters while the lips and face are also driven by four parameters. These parameters are for the most part extracted from the midsagittal contours, and are clearly interpretable in phonetic/biomechanical terms. This implies that most 3D features such as tongue groove or lateral channels can be controlled by articulatory parameters defined for the midsagittal model. Similarly, the 3D geometry of the lips is determined by parameters such as lip protrusion or aperture, that can be measured from a profile view of the face. r

Analysis of tongue configuration in multi-speaker, multi-volume MRI data

Proc. Int. Sem. Spch. …, 2008

MRI data of German vowels and consonants was acquired for 9 speakers. In this paper tongue contours for the vowels were analyzed using the three-mode factor analysis technique PARAFAC. After some difficulties, probably related to what constitutes an adequate speaker sample for this three-mode technique to work, a stable two-factor solution was extracted that explained about 90% of the variance. Factor 1 roughly captured the dimension low back to high front; Factor 2 that from mid front to high back. These factors are compared with earlier models based on PARAFAC. These analyses were based on midsagittal contours; the paper concludes by illustrating from coronal and axial sections how non-midline information could be incorporated into this approach.

Analysis of Tongue Shape and Motion in Speech Production using Statistical Modeling

2009

The mechanisms of speech production are complex and have been raising attention from researchers of both medical and computer vision fields. In the speech production mechanism, the articulator's study is a complex issue, since they have a high level of freedom along this process, namely the tongue, which instigates a problem in its control and observation. In this work it is automatically characterized the tongues shape during the articulation of the oral vowels of Portuguese European by using statistical modeling on MR-images. A point distribution model is built from a set of images collected during artificially sustained articulations of Portuguese European sounds, which can extract the main characteristics of the motion of the tongue. The model built in this work allows understanding more clearly the dynamic speech events involved during sustained articulations. The tongue shape model built can also be useful for speech rehabilitation purposes, specifically to recognize the compensatory movements of the articulators during speech production.

An MRI analysis of the extrinsic tongue muscles during vowel production

Speech Communication, 2007

Functions of the extrinsic tongue muscles in vowel production were examined by measurements of muscle length and tongue tissue deformation using MRI (magnetic resonance imaging). Results from the analysis of Japanese vowel data suggested: (1) Contraction and relaxation of the three subdivisions of the genioglossus (GG) play a dominant role in forming tongue shapes for vowels. (2) The extralingual part of the styloglossus (SG), which was previously thought to cause a high-back tongue position by pulling its insertion point in the tongue, was found to be nearly constant across all vowels both in length and orientation. (3) The tongue shape for back vowels is mainly achieved by internal deformation of the tongue tissue, and the medial tissue of the tongue showed lateral expansion in front vowels, and medial compression in back vowels.

Tongue surface dynamics during speech and swallowing

2000

This investigation characterizes tongue surface dynamics that underlie phonemic variation and that distinguish speech from swallowing. Vertical displacements of pellets affixed to the tongue were extracted from the x‐ray microbeam database [Westbury, J. X‐ray Microbeam Speech Production Database Users Handbook, Version 1 (1994)], which contains articulatory kinematic data from 57 typical speakers.

Tongue-surface movement patterns during speech and swallowing

The Journal of the Acoustical Society of America, 2003

The tongue has been frequently characterized as being composed of several functionally independent articulators. The question of functional regionality within the tongue was examined by quantifying the strength of coupling among four different tongue locations across a large number of consonantal contexts and participants. Tongue behavior during swallowing was also described. Vertical displacements of pellets affixed to the tongue were extracted from the x-ray microbeam database. Forty-six participants recited 20 vowel-consonant-vowel ͑VCV͒ combinations and swallowed 10 ccs of water. Tongue-surface movement patterns were quantitatively described by computing the covariance between the vertical time-histories of all possible pellet pairs. Phonemic differentiation in vertical tongue motions was observed as coupling varied predictably across pellet pairs with place of articulation. Moreover, tongue displacements for speech and swallowing clustered into distinct groups based on their coupling profiles. Functional independence of anterior tongue regions was evidenced by a wide range of movement coupling relations between anterior tongue pellets. The strengths and weaknesses of the covariance-based analysis for characterizing tongue movement are considered.

Human vocal tract analysis by in vivo 3D MRI during phonation: a complete system for imaging, quantitative modeling, and speech synthesis

Medical image computing and computer-assisted intervention : MICCAI ... International Conference on Medical Image Computing and Computer-Assisted Intervention, 2008

We present a complete system for image-based 3D vocal tract analysis ranging from MR image acquisition during phonation, semi-automatic image processing, quantitative modeling including model-based speech synthesis, to quantitative model evaluation by comparison between recorded and synthesized phoneme sounds. For this purpose, six professionally trained speakers, age 22-34y, were examined using a standardized MRI protocol (1.5 T, T1w FLASH, ST 4mm, 23 slices, acq. time 21s). The volunteers performed a prolonged (> or = 21s) emission of sounds of the German phonemic inventory. Simultaneous audio tape recording was obtained to control correct utterance. Scans were made in axial, coronal, and sagittal planes each. Computer-aided quantitative 3D evaluation included (i) automated registration of the phoneme-specific data acquired in different slice orientations, (ii) semi-automated segmentation of oropharyngeal structures, (iii) computation of a curvilinear vocal tract midline in 3D ...