Evaluation of Glottal Epoch Detection Algorithms on Different Voice Types (original) (raw)

Detection of glottal closure instants from speech signals: A quantitative review

IEEE Transactions on Audio, Speech and Language Processing, 2012

The pseudo-periodicity of voiced speech can be exploited in several speech processing applications. This requires however that the precise locations of the Glottal Closure Instants (GCIs) are available. The focus of this paper is the evaluation of automatic methods for the detection of GCIs directly from the speech waveform. Five state-of-the-art GCI detection algorithms are compared using six different databases with contemporaneous electroglottographic recordings as ground truth, and containing many hours of speech by multiple speakers. The five techniques compared are the Hilbert Envelope-based detection (HE), the

Estimation of Glottal Closure Instants in Voiced Speech Using the DYPSA Algorithm

IEEE Transactions on Audio, Speech and Language Processing, 2000

We present the DYPSA algorithm for automatic and reliable estimation of glottal closure instants (GCIs) in voiced speech. Reliable GCI estimation is essential for closed-phase speech analysis, from which can be derived features of the vocal tract and, separately, the voice source. It has been shown that such features can be used with significant advantages in applications such as speaker recognition. DYPSA is automatic and operates using the speech signal alone without the need for an EGG or Laryngograph signal. It incorporates a new technique for estimating GCI candidates and employs dynamic programming to select the most likely candidates according to a defined cost function. We review and evaluate three existing methods and compare our new algorithm to them. Results for DYPSA show GCI detection accuracy to within ±0.25ms on 87% of the test database and fewer than 1% false alarms and misses.

Classification-Based Detection of Glottal Closure Instants from Speech Signals

Interspeech 2017

In this paper a classification-based method for the automatic detection of glottal closure instants (GCIs) from the speech signal is proposed. Peaks in the speech waveforms are taken as candidates for GCI placements. A classification framework is used to train a classification model and to classify whether or not a peak corresponds to the GCI. We show that the detection accuracy in terms of F 1 score is 97.27%. In addition, despite using the speech signal only, the proposed method behaves comparably to a method utilizing the glottal signal. The method is also compared with three existing GCI detection algorithms on publicly available databases.

Estimation of Glottal Closing and Opening Instants in Voiced Speech Using the YAGA Algorithm

IEEE Transactions on Audio, Speech, and Language Processing, 2000

Abstract Accurate estimation of glottal closing instants (GCIs) and opening instants (GOIs) is important for speech processing applications that benefit from glottal-synchronous processing including pitch tracking, prosodic speech modification, speech dereverberation, synthesis and study of pathological voice. We propose the Yet Another GCI/GOI Algorithm (YAGA) to detect GCIs from speech signals by employing multiscale analysis, the group delay function, and N-best dynamic programming. A novel GOI detector based upon the ...

Estimation of glottal closure instants by considering speech signal as a spectrum

Electronics Letters, 2015

Close to glottal closure instants (GCIs), the speech signal is expected to change its amplitude rapidly and, at GCIs, it is expected to have strong negative peaks. A novel algorithm that exploits these two properties for the estimation of GCIs is presented. Here, a symmetrised speech segment is assumed to be a Fourier transform (FT) of an even function. In such a case, at the locations of the GCIs, the strong negative peaks in the symmetrised speech segment correspond to zeros that lie considerably outside the unit circle in the z-plane. The group delay spectrum of the time-domain signal derived by taking inverse FT of this assumed FT is expected to take a value close to −2π at the angular locations of these zeros. Mapping frequency scale to time scale, the frequency bins for which group delay reaches −2π correspond to the locations of GCIs. Theoretical justification for the proposed approach is also presented by defining a novel function called the conditional group delay function. Systematic evaluation is carried out on the CMU Arctic database and the performance of the proposed technique is better than that of the algorithms namely DYPSA, ZFF, YAGA and is close to that of SEDREAMS.

A Quantitative Assessment of Group Delay Methods for Identifying Glottal Closures in Voiced Speech

IEEE Transactions on Audio, Speech and Language Processing, 2000

Abstract Measures based on the group delay of the LPC residual have been used by a number of authors to identify the time instants of glottal closure in voiced speech. In this paper, we discuss the theoretical properties of three such measures and we also present a new measure having useful properties. We give a quantitative assessment of each measure's ability to detect glottal closure instants evaluated using a speech database that includes a direct measurement of glottal activity from a Laryngograph/EGG signal. We find ...

A Fast Method for High-Resolution Voiced/Unvoiced Detection and Glottal Closure/Opening Instant Estimation of Speech

We propose a fast speech analysis method which simultaneously performs high-resolution voiced/unvoiced detection (VUD) and accurate estimation of glottal closure and glottal opening instants (GCIs and GOIs, respectively). The proposed algorithm exploits the structure of the glottal flow derivative in order to estimate GCIs and GOIs only in voiced speech using simple time-domain criteria. We compare our method with well-known GCI/GOI methods, namely, the dynamic programming projected phase-slope algorithm (DYPSA), the yet another GCI/GOI algorithm (YAGA) and the speech event detection using the residual excitation and a mean-based signal (SEDREAMS). Furthermore, we examine the performance of the aforementioned methods when combined with state-of-the-art VUD algorithms, namely, the robust algorithm for pitch tracking (RAPT) and the summation of residual harmonics (SRH). Experiments conducted on the APLAWD and SAM databases show that the proposed algorithm outperforms the state-of-the-art combinations of VUD and GCI/GOI algorithms with respect to almost all evaluation criteria for clean speech. Experiments on speech contaminated with several noise types (white Gaussian, babble, and car-interior) are also presented and discussed. The proposed algorithm out-performs the state-of-the-art combinations in most evaluation criteria for signal-to-noise ratio greater than 10 dB.

ENHANCED ROBUSTNESS TO UNVOICED SPEECH AND NOISE IN THE DYPSA ALGORITHM FOR IDENTIFICATION OF GLOTTAL CLOSURE INSTANTS

2007

The DYPSA algorithm detects glottal closure instants (GCI) in speech signals. We present a modification to the DYPSA algorithm in which a voiced/unvoiced/silence discrimination measure is applied in order to reduce spurious GCIs detected by DYPSA for unvoiced speech or silence periods. Speech classification is addressed by formulating a decision rule for the GCI candidates which classifies the candidates as voiced or unvoiced on the basis of feature measurements extracted from the speech signal alone. Dynamic ...

The SIGMA algorithm for estimation of reference-quality glottal closure instants from electroglottograph signals

2008

Accurate estimation of glottal closure instants (GCIs) in voiced speech is important for speech analysis applications which benefit from glottal-synchronous processing. Electroglottograph (EGG) recordings give a measure of the electrical conductance of the glottis, providing a signal which is proportional to its contact area. EGG signals contain little noise or distortion, providing a good reference from which GCIs can be extracted to evaluate GCI estimation from speech recordings. Many approaches impose a threshold on the differentiated EGG signal which provide accurate results during voiced speech but are prone to errors at the onset and end of voicing; modern algorithms use a similar approach across multiple dyadic scales using the stationary wavelet transform. This paper describes a new method for EGG-based GCI estimation named SIGMA, which is based upon the stationary wavelet transform, peak detection with a group delay function and Gaussian Mixture Modelling for discrimination between true and false GCI candidates.

Epoch extraction of voiced speech

1975

A general theory of epoch extraction of overlapping nonidentical waveforms is presented. The theory is applied to outputs of models of voiced speech production mechanism and to actual speech data. Some typical glottal waveshapes are considered to explain their effect on the speech output. It is shown that the points of excitation of the vocal tract can be precisely identified for continuous speech. It is possible to obtain accurate pitch information by this method even for high-pitched sounds. The epoch extraction has wide applications in speech analysis, speaker verification, speech synthesis, and pitch perception studies.