Three-Dimensional Modeling of Tongue During Speech Using Mri Data (original) (raw)

Three-dimensional linear articulatory modeling of tongue, lips and face, based on MRI and video images

Journal of Phonetics, 2002

In this study, previous articulatory midsagittal models of tongue and lips are extended to full three-dimensional models. The geometry of these vocal organs is measured on one subject uttering a corpus of sustained articulations in French. The 3D data are obtained from magnetic resonance imaging of the tongue, and from front and profile video images of the subject's face marked with small beads. The degrees of freedom of the articulators, i.e., the uncorrelated linear components needed to represent the 3D coordinates of these articulators, are extracted by linear component analysis from these data. In addition to a common jaw height parameter, the tongue is controlled by four parameters while the lips and face are also driven by four parameters. These parameters are for the most part extracted from the midsagittal contours, and are clearly interpretable in phonetic/biomechanical terms. This implies that most 3D features such as tongue groove or lateral channels can be controlled by articulatory parameters defined for the midsagittal model. Similarly, the 3D geometry of the lips is determined by parameters such as lip protrusion or aperture, that can be measured from a profile view of the face. r

Analysis of tongue configuration in multi-speaker, multi-volume MRI data

Proc. Int. Sem. Spch. …, 2008

MRI data of German vowels and consonants was acquired for 9 speakers. In this paper tongue contours for the vowels were analyzed using the three-mode factor analysis technique PARAFAC. After some difficulties, probably related to what constitutes an adequate speaker sample for this three-mode technique to work, a stable two-factor solution was extracted that explained about 90% of the variance. Factor 1 roughly captured the dimension low back to high front; Factor 2 that from mid front to high back. These factors are compared with earlier models based on PARAFAC. These analyses were based on midsagittal contours; the paper concludes by illustrating from coronal and axial sections how non-midline information could be incorporated into this approach.

Analysis of Tongue Shape and Motion in Speech Production using Statistical Modeling

2009

The mechanisms of speech production are complex and have been raising attention from researchers of both medical and computer vision fields. In the speech production mechanism, the articulator's study is a complex issue, since they have a high level of freedom along this process, namely the tongue, which instigates a problem in its control and observation. In this work it is automatically characterized the tongues shape during the articulation of the oral vowels of Portuguese European by using statistical modeling on MR-images. A point distribution model is built from a set of images collected during artificially sustained articulations of Portuguese European sounds, which can extract the main characteristics of the motion of the tongue. The model built in this work allows understanding more clearly the dynamic speech events involved during sustained articulations. The tongue shape model built can also be useful for speech rehabilitation purposes, specifically to recognize the compensatory movements of the articulators during speech production.

An MRI analysis of the extrinsic tongue muscles during vowel production

Speech Communication, 2007

Functions of the extrinsic tongue muscles in vowel production were examined by measurements of muscle length and tongue tissue deformation using MRI (magnetic resonance imaging). Results from the analysis of Japanese vowel data suggested: (1) Contraction and relaxation of the three subdivisions of the genioglossus (GG) play a dominant role in forming tongue shapes for vowels. (2) The extralingual part of the styloglossus (SG), which was previously thought to cause a high-back tongue position by pulling its insertion point in the tongue, was found to be nearly constant across all vowels both in length and orientation. (3) The tongue shape for back vowels is mainly achieved by internal deformation of the tongue tissue, and the medial tissue of the tongue showed lateral expansion in front vowels, and medial compression in back vowels.

Functional segments in tongue movement

Clinical Linguistics & Phonetics, 2004

The tongue is a deformable object, and moves by compressing or expanding local functional segments. For any single phoneme, these functional tongue segments may move in similar or opposite directions, and may reach target maximum synchronously or not. This paper will discuss the independence of five proposed segments in the production of speech. Three studies used ultrasound and tagged Cine-MRI to explore the independence of the tongue segments. High correlations between tongue segments would suggest passive biomechanical constraints and low correlations would suggest active independent control. Both physiological and higher level linguistic constraints were seen in the correlation patterns. Physiological constraints were supported by high correlations between adjacent segments (positive) and distant segments (negative). Linguistic constraints were supported by segmental correlations that changed with the phonemic content of the task.

Application of MRI and biomedical engineering in speech production study

Computer Methods in …, 2009

The speech production has always been a subject of interest both at morphologic and acoustic levels. This knowledge is useful for a better understanding of all the involved mechanisms, and for the construction of articulatory models. Magnetic resonance imaging (MRI) is a powerful technique that allows the study of the whole vocal tract, with good soft tissues contrast and resolution, and permits the calculation of area functions toward a better the understanding of this mechanism. Thus, our aim is to demonstrate the value and application of MRI in speech production study and its relationship with engineering, namely with biomedical engineering. After vocal tract contours extraction, data was processed for three-dimensional reconstruction culminating in models construction of some sounds of the European Portuguese. MRI provides useful morphologic data about the position and shape of the different speech articulators, and the biomedical engineering computational tools for its analysis.

Segmentation of tongue shapes during vowel production in magnetic resonance images based on statistical modelling

Proceedings of the Institution of Mechanical Engineers. Part H, Journal of engineering in medicine, 2018

Quantification of the anatomic and functional aspects of the tongue is pertinent to analyse the mechanisms involved in speech production. Speech requires dynamic and complex articulation of the vocal tract organs, and the tongue is one of the main articulators during speech production. Magnetic resonance imaging has been widely used in speech-related studies. Moreover, the segmentation of such images of speech organs is required to extract reliable statistical data. However, standard solutions to analyse a large set of articulatory images have not yet been established. Therefore, this article presents an approach to segment the tongue in two-dimensional magnetic resonance images and statistically model the segmented tongue shapes. The proposed approach assesses the articulator morphology based on an active shape model, which captures the shape variability of the tongue during speech production. To validate this new approach, a dataset of mid-sagittal magnetic resonance images acquir...

Measurement of temporal changes in vocal tract area function from 3D cine-MRI data

The Journal of the Acoustical Society of America, 2006

A 3D cine-MRI technique was developed based on a synchronized sampling method ͓Masaki et al., J. Acoust. Soc. Jpn. E 20, 375-379 ͑1999͔͒ to measure the temporal changes in the vocal tract area function during a short utterance /aiueo/ in Japanese. A time series of head-neck volumes was obtained after 640 repetitions of the utterance produced by a male speaker, from which area functions were extracted frame-by-frame. A region-based analysis showed that the volumes of the front and back cavities tend to change reciprocally and that the areas near the larynx and posterior edge of the hard palate were almost constant throughout the utterance. The lower four formants were calculated from all the area functions and compared with those of natural speech sounds. The mean absolute percent error between calculated and measured formants among all the frames was 4.5%. The comparison of vocal tract shapes for the five vowels with those from the static MRI method suggested a problem of MRI observation of the vocal tract: data from static MRI tend to result in a deviation from natural vocal tract geometry because of the gravity effect.

Image quality in non-gated versus gated reconstruction of tongue motion using magnetic resonance imaging: a comparison using automated image processing

International Journal of Computer Assisted Radiology and Surgery, 2008

The development of improved coil and gradient technology has increased the range of applications for which MRI is beneficial. The use of magnetic resonance (MR) in imaging the vocal tract and measuring articulatory motion is one of the applications benefiting from this development. Initial studies were based on static image acquisition . It was later concluded that static MRI is not representative of running speech and is more like hyper-articulated speech ; therefore, techniques for acquiring real-time (non-gated) data were developed, providing a more realistic view of articulator movements during speech. Gated MRI is a well-established technique for examining the structure and function of the heart. Two main types of sequences are used; one to examine the motion of the heart during systole and diastole (functional or 'cine' imaging), and one to examine the structure of the heart walls, valves, and associated anatomy. Modifications to the technique involve 'tagging' or highlighting tracts of muscle to explore movement, which have been used successfully to evaluate motion analysis . Cine MRI, in particular, has been used to determine airway volume in sleep apnoea both in adults and children . Both non-gated and gated cine MRI techniques have been successfully used to capture vocal tract characteristics and tongue movement during speech. Diffusion tensor (DT) static MRI has also recently been successfully used for imaging the human tongue and showed promise for use in imaging tongue motion as it shows the direction of the muscle fibres. Successful vocal tract and tongue imaging has been reported using both gated and nongated techniques. Non-gated MRI requires a compromise between temporal and spatial resolution; cine MRI, on the other hand, depends heavily that the timing and motions of the articulatory motions be accurately repeatable. While most literature on MRI for the vocal tract addresses the technical challenges common to both techniques, i.e. the synchronization of the audio and image acquisition as well as the high intensity noise caused by the scanner, the present study aims to systematically compare non-gated and gated cine MRI acquisition techniques and to assess the advantages and disadvantages of each one for imaging tongue motions during speech. The same subjects, text and experimental conditions were used with both approaches. Another feature of this work is the use of longer utterances. While most previous research used mono-syllabic or disyllabic utterances , we study 4-and 6-syllable utterances. These longer utterances are presumably less easily reproducible and so put the gated MRI sequences to a more stringent test. In this paper, we investigate how gated cine MRI sequences compare to non-gated MRI sequences in terms of image quality for imaging tongue motions associated with speech.