Automatic estimation of voice source parameters (original) (raw)
Related papers
EUSIPCO 2008, 2008
This paper presents two approaches to the problem of extracting the parameters of the LF source model directly from the speech waveform. The first approach relies on the glottal formant estimated from the anticausal contribution of speech. Indeed the ZZT technique has recently shown its ability to deconvolve speech into its causal and anticausal components. The second method is based on the glottal open phase obtained by inverse filtering. The notion of unanalyzable frames and the way to detect and correct them are ...
Estimation of LF glottal source parameters based on an ARX model
Annual Conference of the International Speech Communication Association, 2005
We propose a method to estimate the glottal flow based on the ARX model of speech production and on the LF model of glottal flow. This method splits the analysis in two stages: a low frequency analysis to estimate the glottal source parameters which have mainly a low pass effect and a second step to refine the parameters which have also a high pass effect. Along with this new analysis scheme, we introduce a new algorithm to efficiently minimize the nonlinear function resulting from the least square criterion applied to the ARX model. Results on synthetic and natural speech signals prove the effectiveness of the proposed method.
Speech Communication
The aim of this study is to comparatively review and evaluate three variants of the glottal inverse filtering algorithm based on iterative adaptive inverse filtering (IAIF): the Standard algorithm, and two recently proposed variants that use iterative optimal preemphasis (IOP) and a glottal flow model (GFM), respectively. To enable an objective evaluation, a computational physical model of voice production is used to generate time-domain signals pertaining to both the input glottal flow and the output speech pressure, for a wide range of vowels, fundamental frequencies, and voice qualities (involving co-variation of phonation type and loudness). Furthermore, for a fair comparison, the three key parameters of IAIF are selected by an exhaustive search to minimize the root-meansquare error between the estimated and reference glottal flow derivative in each analyzed frame and performance is assessed with two time-domain and two frequency-domain error measures. A conventional evaluation is also carried out with fixed parameter values determined by cross-validation. Results indicate that IOP tends to yield the lowest errors for nonback vowels (reducing errors by 31% on average compared with Standard), especially for not too high fundamental frequencies and not too pressed voice qualities; GFM becomes competitive for normal phonations when fixed parameter values are used; and in other cases, Standard IAIF is still recommended. In addition, the results suggest that not only the overall spectral tilt (as controlled by IOP and GFM) but also the balance between the levels of different spectral regions, can be important for accurate estimation of the glottal flow.
Estimation of amplitude features of the glottal flow by inverse filtering speech pressure signals
Speech Communication, 1998
In this study a new scaling technique is presented which makes it possible to estimate the voice source including its amplitude values by inverse filtering the speech pressure waveform without applying a flow mask. The new technique is based on adjusting the DC-gain of the vocal tract model in inverse filtering to unity. The performance of the new method is tested by analysing correlation between the minimum peak amplitude of the differentiated glottal flow given by the new technique and the sound pressure level of speech. The results show that the new method yields reliable information of the amplitude values of the glottal source without applying a flow mask.
A comparative study of glottal source estimation techniques
Computer Speech & Language, 2012
Abstract Source-tract decomposition (or glottal flow estimation) is one of the basic problems of speech processing. For this, several techniques have been proposed in the literature. However, studies comparing different approaches are almost nonexistent. Besides, experiments have been systematically performed either on synthetic speech or on sustained vowels. In this study we compare three of the main representative state-of-the-art methods of glottal flow estimation: closed-phase inverse filtering, iterative and adaptive inverse ...
IEEE/ACM Transactions on Audio, Speech, and Language Processing
Glottal inverse filtering aims to estimate the glottal airflow signal from a speech signal for applications such as speaker recognition and clinical assessment. Nonetheless, evaluation of inverse filtering performance has been challenging due to the practical difficulty in measuring the true glottal signals while speech signals are recorded. Apart from this, it is suspected that the performance of many methods degrade in conditions that are of great interest, such as breathy voice, high pitch, soft/loud voice, and running speech. This paper presents a comprehensive, objective, and comparative evaluation of state-of-the-art inverse filtering algorithms that takes advantage of speech and glottal signals generated by a physiologically relevant speech synthesizer. The synthesizer provides a realistic simulation of the voice production process, and thus an adequate test bed for revealing the temporal and spectral performance characteristics of each algorithm. Included in the synthetic data are continuous running speech utterances and sustained vowels, which are produced with multiple voice qualities (pressed, slightly pressed, modal, slightly breathy, and breathy) and subglottal pressure levels to simulate the natural variations in real speech. In evaluating the accuracy of a glottal flow estimate, multiple error measures are used, including an error in the estimated signal that measures overall waveform deviation, as well as an error in each of several clinically relevant features extracted from the glottal flow estimate. For two vowel-specific data subsets that were isolated for two open vowels and analyzed with three closedphase approaches, the resulting waveform errors had mean and standard deviation values below 20% and 10%, respectively, of the true glottal source amplitude. These approaches also showed remarkable stability across different voice qualities and subglottal pressure levels. Results of data subset analysis suggest that analysis of close rounded vowels is a major challenge in glottal flow estimation.
Glottal source estimation using a sum-of-exponentials model
IEEE Transactions on Signal Processing, 1992
This correspondence describes an algorithm for estimating the glottal source waveform in voiced speech. The glottal source waveform is described using the LF model proposed by Fant et al. The vocal tract filter is modeled as a pole-zero system. The analysis of vowel sounds from several talkers shows that the analysis procedure leads to an accurate estimate of the glottal source.
Estimation of the glottal pulse from speech or singing voice
2010
The human speech production system is, briefly, the result of the convolution between the excitation signal, the glottal pulse, and the impulse response resulting from the transfer function of the vocal tract. This model of voice production is often referred to in the literature as a source-filter model, where the source represents the flow of the air leaving the lungs and passing through the glottis (space between the vocal folds), and the filter representing the resonances of the vocal tract and lip/nostrils radiation. The estimation of the shape of the glottal pulse from the speech signal is of significant importance in many fields and applications, since the most important features of speech related to voice quality, vocal effort and speech disorders, for example, are mainly due to the voice source. Unfortunately, the glottal flow waveform which is at the origin of the glottal pulse, is a very difficult signal to measure directly and non-invasively. Several methods for estimating the glottal pulse have been proposed over the last few decades, but there is not yet a complete and automatic algorithm which performs reliably. Most of the developed methods are based on an approach called inverse filtering. The inverse filtering approach represents a deconvolution process, i.e., it seeks to obtain the source signal by applying the inverse of the vocal tract transfer function to the output speech signal. Despite the simplicity of the concept, the inverse filtering procedure is complex because the output signal may include noise and it is not straightforward to accurately model the characteristics of the vocal tract filter. In this dissertation we discuss a new glottal pulse prototype and a robust frequency-domain approach for glottal source estimation that uses a phase-related feature based on the Normalized Relative Delays (NRDs) of the harmonics. This model is applied to several speech signals (synthetic and real), and the results of the estimation of the glottal pulse are compared with the ones obtained using other state-of-the-art methods.