Estimation of the glottal pulse from speech or singing voice (original) (raw)

Abstract

The human speech production system is, briefly, the result of the convolution between the excitation signal, the glottal pulse, and the impulse response resulting from the transfer function of the vocal tract. This model of voice production is often referred to in the literature as a source-filter model, where the source represents the flow of the air leaving the lungs and passing through the glottis (space between the vocal folds), and the filter representing the resonances of the vocal tract and lip/nostrils radiation. The estimation of the shape of the glottal pulse from the speech signal is of significant importance in many fields and applications, since the most important features of speech related to voice quality, vocal effort and speech disorders, for example, are mainly due to the voice source. Unfortunately, the glottal flow waveform which is at the origin of the glottal pulse, is a very difficult signal to measure directly and non-invasively. Several methods for estimating the glottal pulse have been proposed over the last few decades, but there is not yet a complete and automatic algorithm which performs reliably. Most of the developed methods are based on an approach called inverse filtering. The inverse filtering approach represents a deconvolution process, i.e., it seeks to obtain the source signal by applying the inverse of the vocal tract transfer function to the output speech signal. Despite the simplicity of the concept, the inverse filtering procedure is complex because the output signal may include noise and it is not straightforward to accurately model the characteristics of the vocal tract filter. In this dissertation we discuss a new glottal pulse prototype and a robust frequency-domain approach for glottal source estimation that uses a phase-related feature based on the Normalized Relative Delays (NRDs) of the harmonics. This model is applied to several speech signals (synthetic and real), and the results of the estimation of the glottal pulse are compared with the ones obtained using other state-of-the-art methods.

Loading Preview

Sorry, preview is currently unavailable. You can download the paper by clicking the button above.

References (87)

4 GLOTTAL PULSE ESTIMATION -STATE OF THE ART Contents 4.1. INTRODUCTION ...................................................................................................................
2. GLOTTAL PULSE PARAMETERIZATION .....................................................................................
2.1. TIME DOMAIN METHODS ......................................................................................................
2.2. FREQUENCY-DOMAIN METHODS .............................................................................................
2.3. MODEL-BASED METHODS......................................................................................................
3. INVERSE FILTERING TECHNIQUES ...........................................................................................
3.1. ITERATIVE ADAPTIVE INVERSE FILTERING (IAIF/PSIAIF) .............................................................
3.2. JAVKIN ET AL. METHOD ........................................................................................................
ZEROS OF THE Z-TRANSFORM (ZZT) AND COMPLEX CEPSTRUM (CC) ..........................................
5. EVALUATION OF THE ESTIMATION OF THE GLOTTAL FLOW .........................................................
6. BRIEF RELATIVE PERFORMANCE COMPARISON ..........................................................................
SUMMARY ......................................................................................................................... Chapter 5 FREQUENCY-DOMAIN APPROACH TO GLOTTAL SOURCE ESTIMATION Contents 5.1. INTRODUCTION ...................................................................................................................
2. GENERAL OVERVIEW AND APPROACH.....................................................................................
2.1. SIGNAL INTEGRATION IN THE FREQUENCY DOMAIN ....................................................................
2.2. NORMALIZED RELATIVE DELAY CONCEPT .................................................................................
PSYSIOLOGICAL SIGNAL AQUISITION FOR SOURCE AND FILTER MODELLING ..................................
GLOTTAL SOURCE ESTIMATION IN THE FREQUENCY DOMAIN .....................................................
4.1. HYBRID LF-ROSENBERG GLOTTAL SOURCE MODEL ....................................................................
4.2. GLOTTAL SOURCE ESTIMATION APPROACH ...............................................................................
5. TESTING THE NEW APPROACH TO GLOTTAL SOURCE ESTIMATION ...............................................
5.1. TESTS WITH SYNTHETIC SPEECH SIGNALS ..................................................................................
5.2. TESTS WITH REAL SPEECH SIGNALS ........................................................................................
5.3. CONCLUSIONS ..................................................................................................................
SUMMARY .......................................................................................................................
Matti Airas. TKK Aparat: An environment for voice inverse filtering and parameterization. Logopedics Phoniatrics Vocology, 33(1), pp. 49-64, 2008.
Matti Airas. Methods and Studies of Laryngeal Voice Quality Analysis in Speech Production. PhD thesis, Helsinki University of Techonolgy, Finland, 2008.
Paavo Alku. Glottal wave analysis with pitch synchronous iterative adaptive inverse filtering. Speech Communication 11, pp. 109-118, 1992.
Paavo Alku, Juha Vinturri, and Erkki Vilkman. Measuring the effect of fundamental frequency raising as a strategy for increasing vocal intensity in soft, normal and loud phonation. Speech Communication 38, pp. 321-334, 2002.
APB + 05] M. Airas, H. Pulakka, T. Backström, and P. Alku. A Toolkit for Voice Inverse Filtering and Parameterization. INTERSPEECH 2005, pp. 2145-2148, September 4-8, Lisbon, 2005.
BDA + 05] Baris Bozkurt, B. Doval, C. d'Alessandro, and T. Dutoit. Zeros of Z-Transform representation with application to source-filter separation in speech. IEEE Signal Processing Letters, vol. 12, nº 4, pp. 344-347, 2005.
Baris Bozkurt. Zeros of the z-transform (ZZT) representation and chirp group delay processing for the analysis of source and filter characteristics of speech signals. PhD thesis, Faculté Polytechinique de Mons, Belgium, 2005.
Mike Brookes, Patrick A. Naylor, and Jon Gudnason. A Quantitative Assessment of Group Delay Methodsfor Identifying Glottal Closures in Voiced Speech. IEEE Transactions on Speech and Audio Processing, vol.14, nº2, 2006.
Federal Fluminense, Brasil, 2007.
Alan Ó Cinnéide. PhD Transfer Report. Institute of Technology, Dublin, March 2008.
Shi-Huang Chen and Yu-Ren Luo. Speaker Verification Using MFCC and Support Vector Machine. IMECS 2009, March 18 -20, Hong Kong, 2009.
CRR + 07] J. Cabral, S. Renals, K. Richmond, and J. Yamagishi. Towards an Improved Modelling of the Glottal Source in Statistical Parametric Speech Synthesis. ISCA SSW6, 2007.
Boris Doval and Cristophe d'Alessandro. The spectrum of glottal flow models. Notes et documents LIMSI 99-07, 1999.
Boris Doval, Cristophe d'Alessandro, and Nathalie Henrich. The voice source as a causal/anticausal linear filter. ISCA (VOQUAL), pp. 16-20, 2003.
Thomas Drugman, Baris Bozkurt, and Thierry Dutoit. Complex Cepstrum-based Decomposition of Speech for Glottal Source Estimation. Interspeech, pp. 116- 119, 2009.
Gilles Degottex. Glottal source and vocal-tract separation. Estimation of glottal parameters, voice transformation and synthesis using a glottal model. PhD thesis, Université Paris, France, 2010.
DDM + 08] Thomas Drugman, Thomas Dubuisson et al. Glottal source estimation robustness. A comparison of sensitivity of voice source estimation techniques. In Proceedings of SIGMAP'2008, pp. 202-207, 2008.
Thomas Drugman. Advances in Glottal Analysis and its Apllications. PhD thesis, Université Paris, France, 2010. University of Mons, Belgium, 2011.
Sandra Dias, Ricardo Sousa and Aníbal Ferreira. Glottal inverse filtering: a new road-map and first results. AFEKA: Speech Processing Conference, Israel, June 2011.
Gunnar Fant. Acoustic Theory of Speech Production. Mouton, The Hague, 1960.
Gunnar Fant. Vocal-source analysis -a progress report. STL-QPSR, 20 (3-4): pp. 31-53, 1979.
Gunnar Fant. The LF-model revisited. Transformations and frequency analysis. STL-QPSR, 36 (2-3): pp. 119-156, 1995.
Aníbal J. S. Ferreira. Combined Spectral envelope normalization and subtraction of sinusoidal components in the ODFT and MDCT frequency domains. IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, pp. 51- 54, 2001.
Aníbal Ferreira and Deepen Sinha. Accurate and robust frequency estimation in the ODFT domain. IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, pp. 203-205, October 2005.
Gunnar Fant, Johan Liljencrants, and Qi-guaq Lin. A four parameter model of glottal flow. STL-QPSR, 4, pp. 1-13, 1985.
Isabel Guimarães and Evelyn Abberton. Fundamental frequency in speakers of Portuguese for different voice samples. Journal of Voice, December, 2005.
Randy Goldberg. A pratical handbook of Speech Coders. CRC Press LLC, 2000.
Brasil, 2008.
Luís Henrique. Acústica musical -3ª edição. Fundação Calouste Gulbenkian, Lisboa, 2009.
Nathalie Henrich. Etude de la source glottique en voix parlée et chantée: modélisation et estimation, mesures acoustiques et électroglottographiques, perception. PhD these, Université Paris, France, 2001.
Xuedong Huang, Alex Acero, and Hsiao-Wuen Hon. Spoken Language Processing: a guide to theory, algorithm, and system development. Prentice- Hall, New Jersey, 2001.
HSA + 03] N. Henrich, G. Sundin, D. Ambroise, et al. Just Noticeable Differences of Open Quotient and Asymmetry Coefficient in Singing Voice. Journal of Voice, Vol. 17, No. 4, pp. 481-494, 2003.
Hector Raul Javkin, Norma Antõnanzas-Barroso, and Ian Maddieson. Digital Inverse Filtering for Linguistic Research. Journal of Speech and Hearing Research 30, pp. 122-129, 1987.
C. R. Jankowski, H. D. H. Vo, and R. P. Lippmann. A comparison of signal processing front ends for automatic word recognition. IEEE Transactions on Speech and Audio Processing, 4(3), pp. 251-266, 1995.
A. El-Jaroudi and J. Makhoul. Discrete All-Pole Modelling. IEEE Transactions on Signal Processing, 39(2), pp. 411-423, February 1991.
George P. kafentzis. On the glottal flow derivative wavefom and its properties. Bachelor's Dissertation, University of Crete, Greece, 2008.
George P. kafentzis. On the inverse filtering of speech. MsC thesis, University of Crete, Greece, 2010.
Raymond D. Kent. The MIT Encyclopedia of Communication Disorders. The MIT Press, Cambridge, 2004.
Dennis Klatt and Laura Klatt. Analysis, synthesis, and perception of voice quality variations among female and male talkers. Journal of the Acoustical Society of America, 87, pp. 820-857, 1990.
Kenneth C. Jones e Anthony J. Gaudin. Introdução à Biologia -3ª edição. Fundação Calouste Gulbenkian, Lisboa, 2000.
Jody Kreiman, Bruce R. Gerrat, and Norma Antõnanzas-Barroso. Analysis and Synthesis of Pathological Voice Quality. Bureau of Glottal Affairs, LA, USA, 2006.
Malte Kob. Physical Modelling of the Singing Voice. PhD thesis, Technical University Aachen, Germany, 2002.
Stephen E. Levinson. Mathematical Models for Speech Technology. John Wiley & Sons, Ltd, England, 2005.
Carlo Magi. All-Pole Modelling of Speech: Mathematical Analysis Combined with Objective and Subjective Evaluation of Seven Selected Methods. MsC thesis, Helsinki University of Techonolgy, Finland, 2005.
Paul Milenkovic. TF32 User's Manual. Madison, Milenkovic, 2001.
Katharine Murphy. Digital signal processing techniques for application in the analysis of pathological voice and normophonic singing voice. PhD thesis, Universidad Politécnica de Madrid, Spain, 2008.
Hema A. Murthy and B. Yegnanarayana. Group delay functions and its applications in speech technology. Indian Academy of Sciences, Vol. 36, Part 5, pp. 745-782, October, 2011.
Manuel D. Ortigueira. Processamento Digital de Sinais. Fundação Calouste Gulbenkian, 2005.
Fernando Pereira. Comunicações Audiovisuais: Tecnologias, Normas e Aplicações. Instituto Superior Técnico, Lisboa, 2009.
Hannu Pulakka. Analysis of Human Voice Production Using Inverse Filtering, High-Speed Imaging, and Electroglottography. MsC thesis, Helsinki University of Technology, Finland, 2005.
N. Sturmel, C. d'Alessandro and B. Doval. Comparative evaluation of the ZZT and inverse filtering for voice source analysis. Scientific Report, 2007 (in http://rs2007.limsi.fr/index.php/PS:Page\_4, accessed 2011).
Ricardo Jorge Ferreira dos Santos. Avaliação de Pacientes com Paralisia Unilateral das Pregas Vocais. Tese de Mestrado, Universidade de Aveiro, Portugal, 2009.
Ricardo Sousa and Aníbal Ferreira. Importance of the relative delay of glottal source harmonics. AES 39 th International Conference, Denmark, June, 2010.
Ricardo Sousa and Aníbal Ferreira. Singing Voice Analysis Using Relative Harmonic Delays. Interspeech, pp. 1997-2000, 2011.
Doutoramento, FEUP, Portugal, 2011.
Christina Shewell. Voice Work: Art and Science in Changing Voices. Wiley- Blackwell, 2009.
Johan Sundberg. The acoustics of the singing voice. Scientific American, 236(3), p. 82-100, March, 1977.
Johan Sundberg. The science of singing voice. Northern Illinois University Press. Dekalb, Illinois, 1987.
I. Titze. Workshop on Acoustic Voice Analysis: Summary Statement. Iowa City, IA: National Center for Voice and Speech, 1994.
Raymond Veldhuis. A computationally efficient alternative for the LF model and its perceptual evaluation. Journal of Acoustical Society of America, 103, pp. 566- 571, 1998.
Neil Weir and Isobel Bassett. Outpatient fibreoptic nasolaryngoscopy and videostroboscopy. Journal of the Royal Society of Medicine, vol. 80, p. 299-300, 1987.
Jacqueline Walker and Peter Murphy. Advanced Methods for Glottal Wave Extraction. NOLISP 2005, LNAI 3817, pp. 139-149, Springer -Verlag, Berlin, 2005.
Jacqueline Walker and Peter Murphy. A Review of Glottal Waveform Analysis. WNSP 2005, LNCS 4391, pp. 1-21, Springer -Verlag, Berlin, 2007.