Estimation of Vocal Fold Characteristics using a Parametric Source Model (original) (raw)

2006, … International Conference on Speech Science and …

We describe a new method to estimate vocal cord dynamics using a parametric model of glottis movements in order to assess the health of the vocal cords and detect pathological conditions of the larynx. The underlying model is based on that proposed by the Fujisaki-Ljungqvist, modified by reducing the number of parameters. The choice of a parametric model enabled us to assess the impact of the various model characteristics as well as the number of parameters. In the first step of the process we estimated the transfer function of the vocal tract and used it to approximate the model-generated speech. The general approach to the vocal tract estimation was based on the hypothesis that the vocal tract filtering action and the glottal forcing functions affect different, non-overlapping, frequency bands, and can therefore, be separated by homomorphic filtering. In the second step we used filtering of the acoustic waveform, using the estimated vocal tract transfer function H(ω) to estimate the shape of the glottal pulses. The actual estimates of H(ω) were obtained by applying an inverse filtering approach, using the cepstral method in conjunction with liftering (filtering) in the cepstral domain. The model parameters were estimated by maximizing the match between the model-generated and the observed speech signal. The general approach consists of finding the parameters of the FL model that would maximize the correspondence between the observed and synthetic utterances filtered by H(ω). The optimization was performed using the Nelder-Mead simplex search method because of the strong nonlinearities, discontinuities and the complex interactions among the model parameters. Prior to estimating the transfer function, the observed speech was filtered to remove the effects of lip radiation. In order to evaluate this approach we used the Kay Elemetrics Disordered Voice database, which comprises over 1,400 voice samples of approximately 700 subjects and includes sustained phonation and running speech samples from patients with a wide variety of organic, neurological, traumatic, and psychogenic voice disorders, as well as from 53 normal speakers. Estimated parameters of the FL model and its combinations were analyzed to determine the potential of this method to assess the health state of the speaker. We will illustrate the applicability of this technique to the problem of discriminating between healthy and pathological speech samples. The classification is based on the resulting estimates of the model parameters. The results suggest that the parameter estimates may provide a useful clinical tool for rapid unobtrusive, triage and diagnosis.