Estimation of glottal closure instants by considering speech signal as a spectrum (original) (raw)
Related papers
Detection of glottal closure instants from speech signals: A quantitative review
IEEE Transactions on Audio, Speech and Language Processing, 2012
The pseudo-periodicity of voiced speech can be exploited in several speech processing applications. This requires however that the precise locations of the Glottal Closure Instants (GCIs) are available. The focus of this paper is the evaluation of automatic methods for the detection of GCIs directly from the speech waveform. Five state-of-the-art GCI detection algorithms are compared using six different databases with contemporaneous electroglottographic recordings as ground truth, and containing many hours of speech by multiple speakers. The five techniques compared are the Hilbert Envelope-based detection (HE), the
A Quantitative Assessment of Group Delay Methods for Identifying Glottal Closures in Voiced Speech
IEEE Transactions on Audio, Speech and Language Processing, 2000
Abstract Measures based on the group delay of the LPC residual have been used by a number of authors to identify the time instants of glottal closure in voiced speech. In this paper, we discuss the theoretical properties of three such measures and we also present a new measure having useful properties. We give a quantitative assessment of each measure's ability to detect glottal closure instants evaluated using a speech database that includes a direct measurement of glottal activity from a Laryngograph/EGG signal. We find ...
Classification-Based Detection of Glottal Closure Instants from Speech Signals
Interspeech 2017
In this paper a classification-based method for the automatic detection of glottal closure instants (GCIs) from the speech signal is proposed. Peaks in the speech waveforms are taken as candidates for GCI placements. A classification framework is used to train a classification model and to classify whether or not a peak corresponds to the GCI. We show that the detection accuracy in terms of F 1 score is 97.27%. In addition, despite using the speech signal only, the proposed method behaves comparably to a method utilizing the glottal signal. The method is also compared with three existing GCI detection algorithms on publicly available databases.
Estimation of Glottal Closing and Opening Instants in Voiced Speech Using the YAGA Algorithm
IEEE Transactions on Audio, Speech, and Language Processing, 2000
Abstract Accurate estimation of glottal closing instants (GCIs) and opening instants (GOIs) is important for speech processing applications that benefit from glottal-synchronous processing including pitch tracking, prosodic speech modification, speech dereverberation, synthesis and study of pathological voice. We propose the Yet Another GCI/GOI Algorithm (YAGA) to detect GCIs from speech signals by employing multiscale analysis, the group delay function, and N-best dynamic programming. A novel GOI detector based upon the ...
Estimation of Glottal Closure Instants in Voiced Speech Using the DYPSA Algorithm
IEEE Transactions on Audio, Speech and Language Processing, 2000
We present the DYPSA algorithm for automatic and reliable estimation of glottal closure instants (GCIs) in voiced speech. Reliable GCI estimation is essential for closed-phase speech analysis, from which can be derived features of the vocal tract and, separately, the voice source. It has been shown that such features can be used with significant advantages in applications such as speaker recognition. DYPSA is automatic and operates using the speech signal alone without the need for an EGG or Laryngograph signal. It incorporates a new technique for estimating GCI candidates and employs dynamic programming to select the most likely candidates according to a defined cost function. We review and evaluate three existing methods and compare our new algorithm to them. Results for DYPSA show GCI detection accuracy to within ±0.25ms on 87% of the test database and fewer than 1% false alarms and misses.
Improved glottal closure instant detector based on linear prediction and standard pitch concept
Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96
This paper proposes an improved method of glottal closure instant detection using linear prediction and standard pitch concept. The main improvements are on its speed of computation and error reduction on position finding for the cases that were not possible or caused many errors using previous methods. Our method can resolve the problems occurring in current methods to some extent. The false location detection rate is reduced to its inherent interpolation capability. Also the amount of computation is reduced. Another benefit from our method is that it does not need additional post processing to find peaks or smoothing of the pitch tracks. All is contained in itself. Also we compared results among three different kinds of linear prediction based pitch detectors.
2008
Accurate estimation of glottal closure instants (GCIs) in voiced speech is important for speech analysis applications which benefit from glottal-synchronous processing. Electroglottograph (EGG) recordings give a measure of the electrical conductance of the glottis, providing a signal which is proportional to its contact area. EGG signals contain little noise or distortion, providing a good reference from which GCIs can be extracted to evaluate GCI estimation from speech recordings. Many approaches impose a threshold on the differentiated EGG signal which provide accurate results during voiced speech but are prone to errors at the onset and end of voicing; modern algorithms use a similar approach across multiple dyadic scales using the stationary wavelet transform. This paper describes a new method for EGG-based GCI estimation named SIGMA, which is based upon the stationary wavelet transform, peak detection with a group delay function and Gaussian Mixture Modelling for discrimination between true and false GCI candidates.
Subband analysis of linear prediction residual for the estimation of glottal closure instants
2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2014
Many state-of-the-art techniques for estimating glottal closure instants (GCIs) use linear prediction residual (LPR) in one way or another. In this paper, subband analysis of LPR is proposed to estimate the GCIs. A composite signal is derived as the sum of the envelopes of the subband components of the LPR signal. Appropriately chosen peaks of the composite signal are the GCI candidates. The temporal locations of the candidates are refined using the LPR to obtain the GCIs, which are validated against the GCIs obtained from the electroglottograph signal, recorded simultaneously. The robustness is studied using additive white, babble and vehicle noises for different signal to noise ratios. The proposed method is evaluated using six different databases and compared with three state-of-the-art LPR based methods. The results show that the performance of the proposed method is comparable to the best of the LPR based techniques for clean as well as noisy speech.
2007
The DYPSA algorithm detects glottal closure instants (GCI) in speech signals. We present a modification to the DYPSA algorithm in which a voiced/unvoiced/silence discrimination measure is applied in order to reduce spurious GCIs detected by DYPSA for unvoiced speech or silence periods. Speech classification is addressed by formulating a decision rule for the GCI candidates which classifies the candidates as voiced or unvoiced on the basis of feature measurements extracted from the speech signal alone. Dynamic ...
Chirp group delay analysis of speech signals
Speech Communication, 2007
This study proposes new group delay estimation techniques that can be used for analyzing resonance patterns of short-term discretetime signals and more specifically speech signals. Phase processing or equivalently group delay processing of speech signals are known to be difficult due to large spikes in the phase/group delay functions that mask the formant structure. In this study, we first analyze in detail the z-transform zero patterns of short-term speech signals in the z-plane and discuss the sources of spikes on group delay functions, namely the zeros closely located to the unit circle. We show that windowing largely influences these patterns, therefore short-term phase processing. Through a systematic study, we then show that reliable phase/group delay estimation for speech signals can be achieved by appropriate windowing and group delay functions can reveal formant information as well as some of the characteristics of the glottal flow component in speech signals. However, such phase estimation is highly sensitive to noise and robust extraction of group delay based parameters remains difficult in real acoustic conditions even with appropriate windowing. As an alternative, we propose processing of chirp group delay functions, i.e. group delay functions computed on a circle other than the unit circle in z-plane, which can be guaranteed to be spike-free. We finally present one application in feature extraction for automatic speech recognition (ASR). We show that chirp group delay representations are potentially useful for improving ASR performance.