Continuous speech recognition using hidden Markov models (original) (raw)

The Application of Hidden Markov Models in Speech Recognition

Foundations and TrendsĀ® in Signal Processing, 2007

Hidden Markov Models (HMMs) provide a simple and effective framework for modelling time-varying spectral vector sequences. As a consequence, almost all present day large vocabulary continuous speech recognition (LVCSR) systems are based on HMMs.

A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition

Although initially introduced and studied in the late 1960s and early 1970s, statistical methods of Markov source or hidden Markov modeling have become increasingly popular in the last several years. There are two strong reasons why this has occurred. First the models are very rich in mathematical structure and hence can form the theoretical basis for use in a wide range of applications. Second the models, when applied properly, work very well in practice for several important applications. In this paper we attempt to carefully and methodically review the theoretical aspects of this type of statistical modeling and show how they have been applied to selected problems in machine recognition of speech.

Speech Recognition Using Hidden Markov Models

The Lincoln robust hidden Markov model speech recognizer currently provides stateof-the-art performance for both speaker-dependent and speaker-independent largevocabulary continuous-speech recognition. An early isolated-word version similarly improved the state of the art on a speaker-stress-robustness isolated-word task. This article combines hidden Markov model and speech recognition tutorials with a description of the above recognition systems.

Combining Neural Networks And Hidden Markov Models For Continuous Speech Recognition

1992

e present a speaker-independent, continuous-speech recog-(nition system based on a hybrid multilayer perceptronMLP)/hidden Markov model (HMM). The system comebines the advantages of both approaches by using MLPs tostimate the state-dependent observation probabilities of anepHMM. New MLP architectures and training procedures arresented that allow the modeling of multiple distributions.Cfor phonetic classes and context-dependent phonetic classesomparisons with a pure HMM system...

Hidden Markov models and neural networks for speech recognition

1998

The Hidden Markov Model (HMMs) is one of the most successful modeling approaches for acoustic events in speech recognition, and more recently it has proven useful for several problems in biological sequence analysis. Although the HMM is good at capturing the temporal nature of processes such as speech, it has a very limited capacity for recognizing complex patterns involving more than rst order dependencies in the observed data sequences. This is due to the rst order state process and the assumption of state conditional independence between observations. Arti cial Neural Networks (NNs) are almost the opposite: they cannot model dynamic, temporally extended phenomena very well, but are good at static classi cation and regression tasks. Combining the two frameworks in a sensible way can therefore lead to a more powerful model with better classi cation abilities. The overall aim of this work has been to develop a probabilistic hybrid of hidden Markov models and neural networks and to evaluate this model on a number of standard speech recognition tasks. This has resulted in a hybrid called a Hidden Neural Network (HNN), in which the HMM emission and transition probabilities are replaced by the outputs of state-speci c neural networks. The HNN framework is characterized by: Discriminative training: HMMs are commonly trained by the Maximum Likelihood (ML) criterion to model within-class data distributions. As opposed to this, the HNN is trained by the Conditional Maximum Likelihood (CML) criterion to discriminate between di erent classes. CML training is in this work implemented by a gradient descent algorithm in which the neural networks are updated by backpropagation of errors calculated by a modi ed version of the forward-backward algorithm for HMMs. Global normalization: A valid probabilistic interpretation of the HNN is ensured by normalizing the model globally at the sequence level during CML training. This is di erent from the local normalization of probabilities enforced at the state level in standard HMMs. Flexibility: The global normalization makes the HNN architecture very exible. Any combination of neural network estimated parameters and standard HMM parameters can be used. Furthermore, the global normalization of the HNN gives a large freedom in selecting the architecture and output functions of the neural networks. v vi Postscript les of this thesis and all the above listed papers can be downloaded from the WWW-server at the Section for Digital Signal Processing. 1 The papers relevant to the work described in this thesis are furthermore included in appendix C-F of this thesis. Acknowledgments At this point I would like to thank Ste en Duus Hansen and Anders Krogh for their supervision of my Ph.D. project. Especially I wish to express my gratitude to Anders Krogh for the guidance, encouragement and friendship that he managed to extend to me during our almost ve years of collaboration. Even during his stay at the Sanger Centre in Cambridge he managed to guide me through the project by always responding to my emails and telephone calls and by inviting me to visit him. Anders' scienti c integrity, great intuition, ambition and pleasant company has earned him my respect. Without his encouragement and optimistic faith in this work it might never have come to an end. The sta and Ph.D. students at the Section for Digital Signal Processing are thanked for creating a very pleasant research environment and for the many joyful moments at the o ce and during conference trips. Thanks also to Mogens Dyrdahl and everybody else involved in maintaining the excellent computing facilities which were crucial for carrying out my research. Center for Biological Sequence Analysis is also acknowledged for providing CPU-time which made some of the computationally intensive evaluations possible. Similarly, Peter Toft is thanked for learning me to master the force of Linux. I sincerely wish to express my gratitude to Steve Renals for inviting me to work at the Department of Computer Science, University of She eld from February to July 1997. It was a very pleasant and rewarding stay. The Ph.D. students and sta at the Department of Computer Science are acknowledged for their great hospitality and for creating a pleasant research atmosphere. I'm especially grateful to Gethin Williams for the many discussions on hybrid speech recognizers and for proofreading large parts of this thesis. I'm indebted to Gethin for his many valuable comments and suggestions to improve this manuscript. Morten With Pedersen and Kirsten Pedersen are also acknowledged for their comments and suggestions to this manuscript. Morten is furthermore thanked for the many fruitful discussions we've had and for his pleasant company at the o ce during the years. The speech group and in particular Christophe Ris at the Circuit Theory and Signal Processing Lab (TCTS), Facult e Polytechnique de Mons is acknowledged for providing data necessary to carry out the experiments presented in chapter 9 of this thesis. The Technical University of Denmark is acknowledged for allowing me the opportunity of doing this work. Otto M nsteds foundation and Valdemar Selmer Trane og Hustru Elisa Trane's foundation is acknowledged for nancial support to travel activities. Last but not least I thank my family and friends for their support, love and care during the Ph.D. study. A special heartfelt thanks goes to my wife and little daughter who helped me maintain my sanity during the study, as I felt myself drowning in ambitions. Without their support this work would not have been possible.

Specifics of Hidden Markov Model Modifications for Large Vocabulary Continuous Speech Recognition

2003

Abstract. Specifics of hidden Markov model-based speech recognition are investigated. Influ-ence of modeling simple and context-dependent phones, using simple Gaussian, two and three-component Gaussian mixture probability density functions for modeling feature distribution, and incorporating language model are discussed. Word recognition rates and model complexity criteria are used for evaluating suitability of these modifications for practical applications. Development of large vocabulary continuous speech recognition system using HTK toolkit and WSJCAM0 English speech corpus is described. Results of experimental investigations are presented. Key words: large vocabulary continuous speech recognition, hidden Markov model, Viterbi

Hybrid Neural Network/hidden Markov Model Continuous-Speech Recognition

1992

nMIn this paper we present a hybrid multilayer perceptron (MLP)/hiddearkov model (HMM) speaker-independent continuous-speech recognibtion system, in which the advantages of both approaches are combinedy using MLPs to estimate the state-dependent observation probabilitiespof an HMM. New MLP architectures and training procedures areresented which allow the modeling of multiple distributions for phoneticapclasses and context-dependent phonetic classes. Comparisons withure HMM system...

An introduction to hidden Markov models

IEEE Signal Processing Magazine, 1986

The basic theory of Markov chains has been known to mathematicians and engineers for close to 80 years, but it is only in the past decade that it has been applied explicitly to problems in speech processing. One of the major reasons why speech models, based on Markov chains, have not been developed until recently was the lack of a method for optimizing the parameters of the Markov model to match observed signal patterns. Such a method was proposed in the late 1960's and was immediately applied to speech processing in several research institutions. Continued refinements in the theory and implementation of Markov modelling techniques have greatly enhanced the method, leading to a wide range of applications of these models. It is the purpose of this tutorial paper to give an introduction to the theory of Markov models, and to illustrate how they have been applied to problems in speech recognition.