Three-Dimensional Modeling of Tongue During Speech Using Mri Data (original) (raw)
Journal of Phonetics, 2002
In this study, previous articulatory midsagittal models of tongue and lips are extended to full three-dimensional models. The geometry of these vocal organs is measured on one subject uttering a corpus of sustained articulations in French. The 3D data are obtained from magnetic resonance imaging of the tongue, and from front and profile video images of the subject's face marked with small beads. The degrees of freedom of the articulators, i.e., the uncorrelated linear components needed to represent the 3D coordinates of these articulators, are extracted by linear component analysis from these data. In addition to a common jaw height parameter, the tongue is controlled by four parameters while the lips and face are also driven by four parameters. These parameters are for the most part extracted from the midsagittal contours, and are clearly interpretable in phonetic/biomechanical terms. This implies that most 3D features such as tongue groove or lateral channels can be controlled by articulatory parameters defined for the midsagittal model. Similarly, the 3D geometry of the lips is determined by parameters such as lip protrusion or aperture, that can be measured from a profile view of the face. r
Analysis of tongue configuration in multi-speaker, multi-volume MRI data
Proc. Int. Sem. Spch. …, 2008
MRI data of German vowels and consonants was acquired for 9 speakers. In this paper tongue contours for the vowels were analyzed using the three-mode factor analysis technique PARAFAC. After some difficulties, probably related to what constitutes an adequate speaker sample for this three-mode technique to work, a stable two-factor solution was extracted that explained about 90% of the variance. Factor 1 roughly captured the dimension low back to high front; Factor 2 that from mid front to high back. These factors are compared with earlier models based on PARAFAC. These analyses were based on midsagittal contours; the paper concludes by illustrating from coronal and axial sections how non-midline information could be incorporated into this approach.
Analysis of Tongue Shape and Motion in Speech Production using Statistical Modeling
2009
The mechanisms of speech production are complex and have been raising attention from researchers of both medical and computer vision fields. In the speech production mechanism, the articulator's study is a complex issue, since they have a high level of freedom along this process, namely the tongue, which instigates a problem in its control and observation. In this work it is automatically characterized the tongues shape during the articulation of the oral vowels of Portuguese European by using statistical modeling on MR-images. A point distribution model is built from a set of images collected during artificially sustained articulations of Portuguese European sounds, which can extract the main characteristics of the motion of the tongue. The model built in this work allows understanding more clearly the dynamic speech events involved during sustained articulations. The tongue shape model built can also be useful for speech rehabilitation purposes, specifically to recognize the compensatory movements of the articulators during speech production.
An MRI analysis of the extrinsic tongue muscles during vowel production
Speech Communication, 2007
Functions of the extrinsic tongue muscles in vowel production were examined by measurements of muscle length and tongue tissue deformation using MRI (magnetic resonance imaging). Results from the analysis of Japanese vowel data suggested: (1) Contraction and relaxation of the three subdivisions of the genioglossus (GG) play a dominant role in forming tongue shapes for vowels. (2) The extralingual part of the styloglossus (SG), which was previously thought to cause a high-back tongue position by pulling its insertion point in the tongue, was found to be nearly constant across all vowels both in length and orientation. (3) The tongue shape for back vowels is mainly achieved by internal deformation of the tongue tissue, and the medial tissue of the tongue showed lateral expansion in front vowels, and medial compression in back vowels.
Tongue surface dynamics during speech and swallowing
2000
This investigation characterizes tongue surface dynamics that underlie phonemic variation and that distinguish speech from swallowing. Vertical displacements of pellets affixed to the tongue were extracted from the x‐ray microbeam database [Westbury, J. X‐ray Microbeam Speech Production Database Users Handbook, Version 1 (1994)], which contains articulatory kinematic data from 57 typical speakers.
Tongue-surface movement patterns during speech and swallowing
The Journal of the Acoustical Society of America, 2003
The tongue has been frequently characterized as being composed of several functionally independent articulators. The question of functional regionality within the tongue was examined by quantifying the strength of coupling among four different tongue locations across a large number of consonantal contexts and participants. Tongue behavior during swallowing was also described. Vertical displacements of pellets affixed to the tongue were extracted from the x-ray microbeam database. Forty-six participants recited 20 vowel-consonant-vowel ͑VCV͒ combinations and swallowed 10 ccs of water. Tongue-surface movement patterns were quantitatively described by computing the covariance between the vertical time-histories of all possible pellet pairs. Phonemic differentiation in vertical tongue motions was observed as coupling varied predictably across pellet pairs with place of articulation. Moreover, tongue displacements for speech and swallowing clustered into distinct groups based on their coupling profiles. Functional independence of anterior tongue regions was evidenced by a wide range of movement coupling relations between anterior tongue pellets. The strengths and weaknesses of the covariance-based analysis for characterizing tongue movement are considered.
Medical image computing and computer-assisted intervention : MICCAI ... International Conference on Medical Image Computing and Computer-Assisted Intervention, 2008
We present a complete system for image-based 3D vocal tract analysis ranging from MR image acquisition during phonation, semi-automatic image processing, quantitative modeling including model-based speech synthesis, to quantitative model evaluation by comparison between recorded and synthesized phoneme sounds. For this purpose, six professionally trained speakers, age 22-34y, were examined using a standardized MRI protocol (1.5 T, T1w FLASH, ST 4mm, 23 slices, acq. time 21s). The volunteers performed a prolonged (> or = 21s) emission of sounds of the German phonemic inventory. Simultaneous audio tape recording was obtained to control correct utterance. Scans were made in axial, coronal, and sagittal planes each. Computer-aided quantitative 3D evaluation included (i) automated registration of the phoneme-specific data acquired in different slice orientations, (ii) semi-automated segmentation of oropharyngeal structures, (iii) computation of a curvilinear vocal tract midline in 3D ...
Modeling and animating the human tongue during speech production
Computer Animation'94., …, 1994
A geometric and kinematic model for describing the global shape and the predominant motions of the human tongue, to be applied in computer animation, is discussed. The model consists of a spatial configuration of moving points that form the vertices of a mesh of 9 3-D triangles. These triangles are interpreted as charge centres (the so-called skeleton) for a potential field, and the surface of the tongue is modelled as an equi-potential surface of this field. In turn, this surface is approximated by a triangular mesh prior to rendering. As to the motion of the skeleton, precautions are taken in order to achieve (approximate) volume conservation; the computation of the triangular mesh describing the surface of the tongue implements penetration avoidance with respect to the palate. Further, the motions of the skeleton derive from a formal speech model which also controls the motion of the lips to arrive at a visually plausible speech synchronous mouth model.
In this work we have constructed biomechanical tongue models derived from MRI data in order to investigate the effects of differing locations of vocal tract bending on variability in motor command space and overall articulatory variability for vowel targets. Acoustic models predict negligible effects of the bend of the vocal tract if its length is held constant. However, the location of this bend crucially determines the relation between vertical and horizontal dimensions of the tract and thus the relative freedom of tongue movement within these dimensions. We predict that articulatory variability will be greater along those dimensions with more degrees of freedom as determined by vocal tract configuration imposed by bend location, and present simulation results that in general support this position.
A control model of human tongue movements in speech
Formal Aspects of Computing, 1997
Tongue movements during speech production have been investigated by means of a simple yet realistic biomechanical model, based on a finite elements modeling of soft tissues, in the framework of the equilibrium point hypothesis ( \(\lambda\) -model) of motor control. In particular, the model has been applied to the estimation of the “central” control commands issued to the muscles, for a data set of mid-sagittal digitized tracings of vocal tract shape, r ecorded by means of low-intensity X-ray cineradiographies during speech. In spite of the highly non-linear mapping between the shape of the oral cavity and its acoustic consequences, the organization of control commands preserves the peculiar spatial organization of vowel phonemes in acoustic space. A factor analysis of control commands, which have been decomposed into independent or “orthogonal” muscle groups, has shown that, in spite of the great mobility of the tongue and the highly complex arrangement of tongue muscles, its movements can be explained in terms of the activation of a small number of independent muscle groups, each corresponding to an elementary or “primitive” movement. These results are consistent with the hypothesis that the tongue is controlled by a small number of independent “articulators”, for which a precise biomechanical substrate is provided. The influence of the effect of jaw and hyoid movements on tongue equilibrium has also bee n evaluated, suggesting that the bony structures cannot be considered as a moving frame of reference, but, indeed, there may be a substantial interaction between them and the tongue, that may only be accounted for by a “global” model. The reported results also define a simple control model for the tongue and, in analogy with similar modelling studies, they suggest that, because of the peculiar geometrical arrangement of tongue muscles, the central nervous system (CNS) may not need a de tailed representation of tongue mechanics but rather may make use of a relatively small number of muscle synergies, that are invariant over the whole space of tongue configurations.
On-line visualization of speech organs using MRI: A 3D approach to speech articulation modeling
2008
A three-dimensional on-line visualization of the vocal tract during speech production was performed based on MRI data obtained from a female speaker producing the six Russian vowels. These images were collected using original method of 3D MRI-scanning where the starting moments of MRI processes enabled co-operative activities from a patient’s side via a special remote-control device. A stroboscopic method of data acquisition was used to reconstruct real articulatory processes for each speech stimulus. Data show clear real-time movements of the lips, tongue, underjaw and mandible, as well as velum and facial surfaces. Thus collected animated data could be exposed for improved teaching and learning foreign languages’ (in our case, the Russian language) technology, as well as for speech synthesis based on a physiologically relevant articulatory model. Sample movies and data analysis strategies are presented.
2000
A linear three-dimensional articulatory model of tongue, lips and face is presented. The model is based on a linear component analysis of the 3D coordinates defining the geometry of the different organs, obtained from Magnetic Resonance Imaging of the tongue, and from front and profile video images of the subject's face marked with small beads. In addition to a common jaw height parameter, the tongue is controlled by five parameters while the lip and face are driven by four parameters, that can be interpreted in phonetic / articulatory terms. This model has been finally integrated into the ICP virtual talking head.
Toward Dynamic Magnetic Resonance Imaging of the Vocal Tract During Speech Production
Journal of Voice, 2011
The most recent and significant magnetic resonance imaging (MRI) improvements allow for the visualization of the vocal tract during speech production, which has revealed to be a powerful tool in dynamic speech research. However, a synchronization technique with enhanced temporal resolution is still required. Objectives/ Hypothesis and methods: Throughout this work, a technique for the dynamic study of the vocal tract with MRI by using the heart's signal to synchronize and trigger the imaging acquisition process is presented and described. The technique in question is then used in the measurement of four speech articulatory parameters in order to assess three different syllables (articulatory gestures) of European Portuguese Language. Study design: Transversal. The acquired MR images are automatically reconstructed so as to result in a variable sequence of images (slices) of different vocal tract shapes in articulatory positions associated with Portuguese speech sounds. The knowledge obtained as a result of the proposed technique represents a direct contribution to the improvement of speech synthesis algorithms, thereby allowing for novel perceptions in coarticulation studies, in addition to providing further efficient clinical guidelines in the pursuit of more proficient speech rehabilitation processes.
Functional segments in tongue movement
Clinical Linguistics & Phonetics, 2004
The tongue is a deformable object, and moves by compressing or expanding local functional segments. For any single phoneme, these functional tongue segments may move in similar or opposite directions, and may reach target maximum synchronously or not. This paper will discuss the independence of five proposed segments in the production of speech. Three studies used ultrasound and tagged Cine-MRI to explore the independence of the tongue segments. High correlations between tongue segments would suggest passive biomechanical constraints and low correlations would suggest active independent control. Both physiological and higher level linguistic constraints were seen in the correlation patterns. Physiological constraints were supported by high correlations between adjacent segments (positive) and distant segments (negative). Linguistic constraints were supported by segmental correlations that changed with the phonemic content of the task.
Application of MRI and biomedical engineering in speech production study
Computer Methods in …, 2009
The speech production has always been a subject of interest both at morphologic and acoustic levels. This knowledge is useful for a better understanding of all the involved mechanisms, and for the construction of articulatory models. Magnetic resonance imaging (MRI) is a powerful technique that allows the study of the whole vocal tract, with good soft tissues contrast and resolution, and permits the calculation of area functions toward a better the understanding of this mechanism. Thus, our aim is to demonstrate the value and application of MRI in speech production study and its relationship with engineering, namely with biomedical engineering. After vocal tract contours extraction, data was processed for three-dimensional reconstruction culminating in models construction of some sounds of the European Portuguese. MRI provides useful morphologic data about the position and shape of the different speech articulators, and the biomedical engineering computational tools for its analysis.
Proceedings of the Institution of Mechanical Engineers. Part H, Journal of engineering in medicine, 2018
Quantification of the anatomic and functional aspects of the tongue is pertinent to analyse the mechanisms involved in speech production. Speech requires dynamic and complex articulation of the vocal tract organs, and the tongue is one of the main articulators during speech production. Magnetic resonance imaging has been widely used in speech-related studies. Moreover, the segmentation of such images of speech organs is required to extract reliable statistical data. However, standard solutions to analyse a large set of articulatory images have not yet been established. Therefore, this article presents an approach to segment the tongue in two-dimensional magnetic resonance images and statistically model the segmented tongue shapes. The proposed approach assesses the articulator morphology based on an active shape model, which captures the shape variability of the tongue during speech production. To validate this new approach, a dataset of mid-sagittal magnetic resonance images acquir...
Measurement of temporal changes in vocal tract area function from 3D cine-MRI data
The Journal of the Acoustical Society of America, 2006
A 3D cine-MRI technique was developed based on a synchronized sampling method ͓Masaki et al., J. Acoust. Soc. Jpn. E 20, 375-379 ͑1999͔͒ to measure the temporal changes in the vocal tract area function during a short utterance /aiueo/ in Japanese. A time series of head-neck volumes was obtained after 640 repetitions of the utterance produced by a male speaker, from which area functions were extracted frame-by-frame. A region-based analysis showed that the volumes of the front and back cavities tend to change reciprocally and that the areas near the larynx and posterior edge of the hard palate were almost constant throughout the utterance. The lower four formants were calculated from all the area functions and compared with those of natural speech sounds. The mean absolute percent error between calculated and measured formants among all the frames was 4.5%. The comparison of vocal tract shapes for the five vowels with those from the static MRI method suggested a problem of MRI observation of the vocal tract: data from static MRI tend to result in a deviation from natural vocal tract geometry because of the gravity effect.
International Journal of Computer Assisted Radiology and Surgery, 2008
The development of improved coil and gradient technology has increased the range of applications for which MRI is beneficial. The use of magnetic resonance (MR) in imaging the vocal tract and measuring articulatory motion is one of the applications benefiting from this development. Initial studies were based on static image acquisition . It was later concluded that static MRI is not representative of running speech and is more like hyper-articulated speech ; therefore, techniques for acquiring real-time (non-gated) data were developed, providing a more realistic view of articulator movements during speech. Gated MRI is a well-established technique for examining the structure and function of the heart. Two main types of sequences are used; one to examine the motion of the heart during systole and diastole (functional or 'cine' imaging), and one to examine the structure of the heart walls, valves, and associated anatomy. Modifications to the technique involve 'tagging' or highlighting tracts of muscle to explore movement, which have been used successfully to evaluate motion analysis . Cine MRI, in particular, has been used to determine airway volume in sleep apnoea both in adults and children . Both non-gated and gated cine MRI techniques have been successfully used to capture vocal tract characteristics and tongue movement during speech. Diffusion tensor (DT) static MRI has also recently been successfully used for imaging the human tongue and showed promise for use in imaging tongue motion as it shows the direction of the muscle fibres. Successful vocal tract and tongue imaging has been reported using both gated and nongated techniques. Non-gated MRI requires a compromise between temporal and spatial resolution; cine MRI, on the other hand, depends heavily that the timing and motions of the articulatory motions be accurately repeatable. While most literature on MRI for the vocal tract addresses the technical challenges common to both techniques, i.e. the synchronization of the audio and image acquisition as well as the high intensity noise caused by the scanner, the present study aims to systematically compare non-gated and gated cine MRI acquisition techniques and to assess the advantages and disadvantages of each one for imaging tongue motions during speech. The same subjects, text and experimental conditions were used with both approaches. Another feature of this work is the use of longer utterances. While most previous research used mono-syllabic or disyllabic utterances , we study 4-and 6-syllable utterances. These longer utterances are presumably less easily reproducible and so put the gated MRI sequences to a more stringent test. In this paper, we investigate how gated cine MRI sequences compare to non-gated MRI sequences in terms of image quality for imaging tongue motions associated with speech.