Estimation of the glottal pulse from speech or singing voice (original) (raw)

A comparative study of glottal source estimation techniques

Computer Speech & Language, 2012

Abstract Source-tract decomposition (or glottal flow estimation) is one of the basic problems of speech processing. For this, several techniques have been proposed in the literature. However, studies comparing different approaches are almost nonexistent. Besides, experiments have been systematically performed either on synthetic speech or on sustained vowels. In this study we compare three of the main representative state-of-the-art methods of glottal flow estimation: closed-phase inverse filtering, iterative and adaptive inverse ...

Glottal source processing: From analysis to applications

Computer Speech & Language, 2014

The great majority of current voice technology applications relies on acoustic features characterizing the vocal tract response, such as the widely used MFCC of LPC parameters. Nonetheless, the airflow passing through the vocal folds, and called glottal flow, is expected to exhibit a relevant complementarity. Unfortunately, glottal analysis from speech recordings requires specific and more complex processing operations, which explains why it has been generally avoided. This review gives a general overview of techniques which have been designed for glottal source processing. Starting from fundamental analysis tools of pitch tracking, glottal closure instant detection, glottal flow estimation and modelling, this paper then highlights how these solutions can be properly integrated within various voice technology applications.

Combination of Linear Prediction and Phase Decomposition for Glottal Source Analysis on Voiced Speech

ArXiv, 2016

Some glottal analysis approaches based upon linear prediction or complex cepstrum approaches have been proved to be effective to estimate glottal source from real speech utterances. We propose a new approach employing both an all-pole odd-order linear prediction to provide a coarse estimation and phase decomposition based causality/anti-causality separation to generate further refinements. The obtained measures show that this method improved performance in terms of reducing source-filter separation in estimation of glottal flow pulses (GFP). No glottal model fitting is required by this method, thus it has wide and flexible adaptation to retain fidelity of speakers's vocal features with computationally affordable resource. The method is evaluated on real speech utterances to validate it.

Glottal source estimation using a sum-of-exponentials model

IEEE Transactions on Signal Processing, 1992

This correspondence describes an algorithm for estimating the glottal source waveform in voiced speech. The glottal source waveform is described using the LF model proposed by Fant et al. The vocal tract filter is modeled as a pole-zero system. The analysis of vowel sounds from several talkers shows that the analysis procedure leads to an accurate estimate of the glottal source.

Estimating the glottal waveform and the vocal-tract filter from a vowel sound signal

2003 IEEE Pacific Rim Conference on Communications Computers and Signal Processing (PACRIM 2003) (Cat. No.03CH37490), 2000

Abstmcf-A pitch-synchronous signal processing method for estimatiag the glottal waveform and the vocal-tract fdter is presented:This method is advantageous over existing methods. First, no asumptions about the shape of the glottal wave are made in the eStimation, and conseqnently it ean he used for any types of voices Second, it fan obtain more detailed Srmctures of the derivative of the glottal wave than using other methods. Third, the iduence of the glottal wave on the estimation of the voeal-trsct fdter is eliminated The glottal waveform, the derivative of the glottal wave and the voeal-tract fdter obtained using this method for two vowel sounds, produced by a female and a d e subject, are illustrated This method has applications in speech pathology, speech synthesis, speaker identification and so on.

Glottal source estimation robustness

Proc. International Conference on Signal Processing and Multimedia Applications (SIGMAP), 2008

Abstract: This paper addresses the problem of estimating the voice source directly from speech waveforms. A novel principle based on Anticausality Dominated Regions (ACDR) is used to estimate the glottal open phase. This technique is compared to two other state-of-the-art well-known methods, namely the Zeros of the Z-Transform (ZZT) and the Iterative Adaptive Inverse Filtering (IAIF) algorithms. Decomposition quality is assessed on synthetic signals through two objective measures: the spectral distortion and a glottal formant ...

Voice source parameters estimation by fitting the glottal formant and the inverse filtering open phase

EUSIPCO 2008, 2008

This paper presents two approaches to the problem of extracting the parameters of the LF source model directly from the speech waveform. The first approach relies on the glottal formant estimated from the anticausal contribution of speech. Indeed the ZZT technique has recently shown its ability to deconvolve speech into its causal and anticausal components. The second method is based on the glottal open phase obtained by inverse filtering. The notion of unanalyzable frames and the way to detect and correct them are ...

Glottal Source Estimation Using an Automatic Chirp Decomposition

In a previous work, we showed that the glottal source can be estimated from speech signals by computing the Zeros of the Z-Transform (ZZT). Decomposition was achieved by separating the roots inside (causal contribution) and outside (anticausal contribution) the unit circle. In order to guarantee a correct deconvolution, time alignment on the Glottal Closure Instants (GCIs) was shown to be essential. This paper extends the formalism of ZZT by evaluating the Z-transform on a contour possibly different from the unit circle. A method is proposed for determining automatically this contour by inspecting the root distribution. The derived Zeros of the Chirp Z-Transform (ZCZT)-based technique turns out to be much more robust to GCI location errors.

Characterization of Glottal Activity From Speech Signals

IEEE Signal Processing Letters, 2009

The objective of this work is to characterize certain important features of excitation of speech, namely, detecting the regions of glottal activity and estimating the strength of excitation in each glottal cycle. The proposed method is based on the assumption that the excitation to the vocal-tract system can be approximated by a sequence of impulses of varying strengths. The effect due to an impulse in the time-domain is spread uniformly across the frequency-domain including at zero-frequency. We propose the use of a zero-frequency resonator to extract the characteristics of excitation source from speech signals by filtering out most of the time-varying vocal-tract information. The regions of glottal activity and the strengths of excitation estimated from the speech signal are in close agreement with those observed from the simultaneously recorded electro-glotto-graph signals. The performance of the proposed glottal activity detection is evaluated under different noisy environments at varying levels of degradation.

Evaluation of Glottal Inverse Filtering Algorithms Using a Physiologically Based Articulatory Speech Synthesizer

IEEE/ACM Transactions on Audio, Speech, and Language Processing

Glottal inverse filtering aims to estimate the glottal airflow signal from a speech signal for applications such as speaker recognition and clinical assessment. Nonetheless, evaluation of inverse filtering performance has been challenging due to the practical difficulty in measuring the true glottal signals while speech signals are recorded. Apart from this, it is suspected that the performance of many methods degrade in conditions that are of great interest, such as breathy voice, high pitch, soft/loud voice, and running speech. This paper presents a comprehensive, objective, and comparative evaluation of state-of-the-art inverse filtering algorithms that takes advantage of speech and glottal signals generated by a physiologically relevant speech synthesizer. The synthesizer provides a realistic simulation of the voice production process, and thus an adequate test bed for revealing the temporal and spectral performance characteristics of each algorithm. Included in the synthetic data are continuous running speech utterances and sustained vowels, which are produced with multiple voice qualities (pressed, slightly pressed, modal, slightly breathy, and breathy) and subglottal pressure levels to simulate the natural variations in real speech. In evaluating the accuracy of a glottal flow estimate, multiple error measures are used, including an error in the estimated signal that measures overall waveform deviation, as well as an error in each of several clinically relevant features extracted from the glottal flow estimate. For two vowel-specific data subsets that were isolated for two open vowels and analyzed with three closedphase approaches, the resulting waveform errors had mean and standard deviation values below 20% and 10%, respectively, of the true glottal source amplitude. These approaches also showed remarkable stability across different voice qualities and subglottal pressure levels. Results of data subset analysis suggest that analysis of close rounded vowels is a major challenge in glottal flow estimation.