Limited-Vocabulary Estonian Continuous Speech Recognition System using Hidden Markov Models (original) (raw)

Speech Recognition Using Monophone and Triphone Based Continuous Density Hidden Markov Models

Abstract—Speech Recognition is a process of transcribing speech to text. Phoneme based modeling is used where in each phoneme is represented by Continuous Density Hidden Markov Model. Mel Frequency Cepstral Coefficients (MFCC) are extracted from speech signal, delta and double-delta features representing the temporal rate of change of features are added which considerably improves the recognition accuracy. Each phoneme is represented by tristate Hidden Markov Model(HMM) with each state being represented by Continuous Density Gaussian model. As single mixture gaussian model do not represent the distribution of feature vectors in a better way, mixture splitting is performed successively in stages to eight mixture gaussian components. The multi-gaussian monophone models so generated do not capture all the variations of a phone with respect to its context, context dependent triphone models are build and the states are tied using decision tree based clustering. It is observed that recognition accuracy increases as the number of mixture components is increased and it works well for tied-state triphone based HMMs for large vocabulary. TIMIT Acoustic-Phonetic Continuous Speech Corpus is used for implementation. Recognition accuracy is also tested for our recorded speech.

Specifics of Hidden Markov Model Modifications for Large Vocabulary Continuous Speech Recognition

2003

Abstract. Specifics of hidden Markov model-based speech recognition are investigated. Influ-ence of modeling simple and context-dependent phones, using simple Gaussian, two and three-component Gaussian mixture probability density functions for modeling feature distribution, and incorporating language model are discussed. Word recognition rates and model complexity criteria are used for evaluating suitability of these modifications for practical applications. Development of large vocabulary continuous speech recognition system using HTK toolkit and WSJCAM0 English speech corpus is described. Results of experimental investigations are presented. Key words: large vocabulary continuous speech recognition, hidden Markov model, Viterbi

The Application of Hidden Markov Models in Speech Recognition

Foundations and Trends® in Signal Processing, 2007

Hidden Markov Models (HMMs) provide a simple and effective framework for modelling time-varying spectral vector sequences. As a consequence, almost all present day large vocabulary continuous speech recognition (LVCSR) systems are based on HMMs.

Speech Recognition Using Hidden Markov Models

The Lincoln robust hidden Markov model speech recognizer currently provides stateof-the-art performance for both speaker-dependent and speaker-independent largevocabulary continuous-speech recognition. An early isolated-word version similarly improved the state of the art on a speaker-stress-robustness isolated-word task. This article combines hidden Markov model and speech recognition tutorials with a description of the above recognition systems.

Hidden Markov models (HMMs) isolated word recognizer with the optimization of acoustical analysis and modeling techniques

International Journal of Physical Sciences, 2011

Most state of the art automatic speech recognition (ASR) systems are typically based on continuous Hidden Markov Models (HMMs) as acoustic modeling technique. It has been shown that the performance of HMM speech recognizers may be affected by a bad choice of the type of acoustic feature parameters in the acoustic front end module. For these reasons, we propose in this paper a dedicated isolated word recognition system based on HMMs which was carefully optimized specifically at the acoustic analysis and HMM acoustical modeling levels. Such conception was tested and valued on Hidden Markov model toolkit platform (HTK). Systems performances were evaluated using the TIMIT database. One comparative study was carried out using two types of speech analysis: The cepstral method referred to as Mel frequency cepstral coefficients (MFCC) and the perceptual linear predictive (PLP) coding are used for different tests so as to evaluate and reinforce our conception. The frame shift duration effect of the acoustic analysis as well as the addition of the dynamic coefficients of the acoustic parameters (MFCC and PLP) were carefully tested in order to look for high accuracy for our optimized isolated word recognition (IWR) system. Finally, various experiments related to the HMM topology have been carried out in order to get better recognition accuracies. In fact, the effect of some modeling parameters of HMM on the recognition accuracy of the IWR system such as the number of states as well as the number of Gaussian mixtures were analyzed in order to get the optimal HMM topology.

A speaker-independent continuous speech recognition system using continuous mixture Gaussian density HMM of phoneme-sized units

IEEE Transactions on Speech and Audio Processing, 1993

The author describes a large vocabulary, speaker-independent, continuous speech recognition system which is based on hidden Markov modeling (HMM) of phoneme-sized acoustic units using continuous mixture Gaussian densities. A bottom-up merging algorithm is developed for estimating the parameters of the mixture Gaussian densities, where the resultant number of mixture components is proportional to both the sample size and dispersion of training data. A compression procedure is developed to construct a word transcription dictionary from the acoustic-phonetic labels of sentence utterances. A modified word-pair grammar using context-sensitive grammatical parts is incorporated to constrain task difficulty. The Viterbi beam search is used for decoding. The segmental K-means algorithm is implemented as a baseline for evaluating the bottom-up merging technique. The system has been evaluated on the TIMIT database (1990) for a vocabulary size of 853. For test set perplexities of 24, 104, and 853, the decoding word accuracies are 90.9%, 86.0%, and 62.9%, respectively. For the perplexity of 104, the decoding accuracy achieved by using the merging algorithm is 4.1% higher than that using the segmental K-means (22.8% error reduction), and the decoding accuracy using the compressed dictionary is 3.0% higher than that using a standard dictionary (18.1% error reduction)

Connected-digits recognition for an under-resourced language using Hidden Markov Models

This paper presents the development of a speech recognition system for automatically recognizing fluently spoken digit strings in Northern Sotho. The digit strings can be isolated or connected/continuous with known or unknown length. The digit recognition system has been trained with the aim of satisfying its potential end-users. Our main research focus was to enhance the robustness of a connected-digit recognizer such that it can handle continous speech input restricted to numeric digits vocabularies. The Hidden Markov Model Toolkit (HTK) was used for experimentation. The standard technique that is based on the use of hidden Markov models (HMMs) was augmented with Cepstral Mean Vector Normalization (CMVN); a technique designed to handle convoluted distortions with the aim of increasing the robustness of speech recognition systems. A 1255 words dataset extracted from an existing general-purpose Northern Sotho speech database collected from mother tongue speakers between the ages of 16 and 60 was used in our experiment. The CMVN technique obtained a phone recognition accuracy of 75.84% and a word recognition accuracy of 62.30% whereas the standard HMM-based technique obtained phone recognition accuracy of 72.45% and a word recognition accuracy of 4.57%.

A Hidden Markov Model-Based Speech Recognition System Using Baum-Welch, Forward-Backward and Viterbi Algorithms

Speech is the most complex part or component of human intelligence and for that matter speech signal processing is very important. The variability of speech is very high, and this makes speech recognition difficult. Other factors like dialects, speech duration, context dependency, different speech speed, speaker differentiation, environment and locality all add to the difficulty in speech processing. The absence of distinct boundaries between tones or words causes additional problems. Speech has speaker dependent characteristics, so that no one can reproduce or repeat phrases in the same way as another. Nevertheless, a speech recognition system should be able to model and recognize the same words and phrases absolutely. Digital signal processors (DSP) are often used in speech signal processing systems to control these complexities. This paper presents a Hidden Markov Model (HMM) based speech signal modulation through the application of the Baum-Welch, Forward-Backward and Viterbi algorithms. The system was implemented using a 16-bit floating point DSP (TMS320C6701) from Texas instruments and the vocabulary was trained using the Microsoft Hidden Markov Model Toolkit (HTK). The proposed system achieved about 79% correct word recognition which represents approximately 11,804 correct words recognized out of a total of 14960 words provided. This result indicates that the proposed model accuracy and speaker independent system has a very good evaluation score, and thus can be used to aid dictation for speech impaired persons and applications in real time with a 10 ms data exchange rate.

Continuous speech recognition using hidden Markov models

IEEE Assp Magazine, 1990

Stochastic signal processing techniques have profoundly changed our perspective on speech processing. We have witnessed a progression from heuristic algorithms to detailed statistical approaches based on iterative analysis techniques. Markov modeling provides a mathematically rigorous approach to developing robust statistical signal models. Since t h e i n t r o d u c t i o n of Markov models t o speech processing in t h e middle 1970s. continuous speech recognition technology has come of age. Dramatic advances have been made in characterizing the temporal and spectral evolution of the speech signal. A t the same time, our appreciation o f t h e need t o explain complex acoustic manifestations b y integration of application constraints into low level signal processing has grown. In this paper, w e review the use of Markov models in continuous speech recognition. Markov models are presented as a generalization of i t s predecessor technology, Dynamic Programming. A unified view is offered in which b o t h linguistic decoding and acoustic matching are integrated into a single optimal network search framework.

Speech Recognition Using Hidden Markov Model Algorithm

Speech recognition applications are becoming more useful nowadays. With growth in the needs for embedded computing and the demand for emerging embedded platforms, it is required that speech recognition systems are available but speech recognition software being closed source cannot be used easily for implementation of speech recognition based devices. Aim To implement English words speech recognition system using Matlab (GUI). This work is based on Hidden Markov Model, which provides a highly reliable way for recognizing speech. Training data such as words like go up, go right, open, close etc. records in audacity open source; the system will test it with data record and display it in edit text box.

Limited-Vocabulary Estonian Continuous Speech Recognition System using Hidden Markov Models (original) (raw)

Related papers