Specifics of Hidden Markov Model Modifications for Large Vocabulary Continuous Speech Recognition (original) (raw)

Speech Recognition Using Monophone and Triphone Based Continuous Density Hidden Markov Models

Abstract—Speech Recognition is a process of transcribing speech to text. Phoneme based modeling is used where in each phoneme is represented by Continuous Density Hidden Markov Model. Mel Frequency Cepstral Coefficients (MFCC) are extracted from speech signal, delta and double-delta features representing the temporal rate of change of features are added which considerably improves the recognition accuracy. Each phoneme is represented by tristate Hidden Markov Model(HMM) with each state being represented by Continuous Density Gaussian model. As single mixture gaussian model do not represent the distribution of feature vectors in a better way, mixture splitting is performed successively in stages to eight mixture gaussian components. The multi-gaussian monophone models so generated do not capture all the variations of a phone with respect to its context, context dependent triphone models are build and the states are tied using decision tree based clustering. It is observed that recognition accuracy increases as the number of mixture components is increased and it works well for tied-state triphone based HMMs for large vocabulary. TIMIT Acoustic-Phonetic Continuous Speech Corpus is used for implementation. Recognition accuracy is also tested for our recorded speech.

The Application of Hidden Markov Models in Speech Recognition

Foundations and Trends® in Signal Processing, 2007

Hidden Markov Models (HMMs) provide a simple and effective framework for modelling time-varying spectral vector sequences. As a consequence, almost all present day large vocabulary continuous speech recognition (LVCSR) systems are based on HMMs.

A Hidden Markov Model-Based Speech Recognition System Using Baum-Welch, Forward-Backward and Viterbi Algorithms

Speech is the most complex part or component of human intelligence and for that matter speech signal processing is very important. The variability of speech is very high, and this makes speech recognition difficult. Other factors like dialects, speech duration, context dependency, different speech speed, speaker differentiation, environment and locality all add to the difficulty in speech processing. The absence of distinct boundaries between tones or words causes additional problems. Speech has speaker dependent characteristics, so that no one can reproduce or repeat phrases in the same way as another. Nevertheless, a speech recognition system should be able to model and recognize the same words and phrases absolutely. Digital signal processors (DSP) are often used in speech signal processing systems to control these complexities. This paper presents a Hidden Markov Model (HMM) based speech signal modulation through the application of the Baum-Welch, Forward-Backward and Viterbi algorithms. The system was implemented using a 16-bit floating point DSP (TMS320C6701) from Texas instruments and the vocabulary was trained using the Microsoft Hidden Markov Model Toolkit (HTK). The proposed system achieved about 79% correct word recognition which represents approximately 11,804 correct words recognized out of a total of 14960 words provided. This result indicates that the proposed model accuracy and speaker independent system has a very good evaluation score, and thus can be used to aid dictation for speech impaired persons and applications in real time with a 10 ms data exchange rate.

Speech Recognition Using Hidden Markov Models

The Lincoln robust hidden Markov model speech recognizer currently provides stateof-the-art performance for both speaker-dependent and speaker-independent largevocabulary continuous-speech recognition. An early isolated-word version similarly improved the state of the art on a speaker-stress-robustness isolated-word task. This article combines hidden Markov model and speech recognition tutorials with a description of the above recognition systems.

Speech Recognition Using Hidden Markov Model Algorithm

Speech recognition applications are becoming more useful nowadays. With growth in the needs for embedded computing and the demand for emerging embedded platforms, it is required that speech recognition systems are available but speech recognition software being closed source cannot be used easily for implementation of speech recognition based devices. Aim To implement English words speech recognition system using Matlab (GUI). This work is based on Hidden Markov Model, which provides a highly reliable way for recognizing speech. Training data such as words like go up, go right, open, close etc. records in audacity open source; the system will test it with data record and display it in edit text box.

A speaker-independent continuous speech recognition system using continuous mixture Gaussian density HMM of phoneme-sized units

IEEE Transactions on Speech and Audio Processing, 1993

The author describes a large vocabulary, speaker-independent, continuous speech recognition system which is based on hidden Markov modeling (HMM) of phoneme-sized acoustic units using continuous mixture Gaussian densities. A bottom-up merging algorithm is developed for estimating the parameters of the mixture Gaussian densities, where the resultant number of mixture components is proportional to both the sample size and dispersion of training data. A compression procedure is developed to construct a word transcription dictionary from the acoustic-phonetic labels of sentence utterances. A modified word-pair grammar using context-sensitive grammatical parts is incorporated to constrain task difficulty. The Viterbi beam search is used for decoding. The segmental K-means algorithm is implemented as a baseline for evaluating the bottom-up merging technique. The system has been evaluated on the TIMIT database (1990) for a vocabulary size of 853. For test set perplexities of 24, 104, and 853, the decoding word accuracies are 90.9%, 86.0%, and 62.9%, respectively. For the perplexity of 104, the decoding accuracy achieved by using the merging algorithm is 4.1% higher than that using the segmental K-means (22.8% error reduction), and the decoding accuracy using the compressed dictionary is 3.0% higher than that using a standard dictionary (18.1% error reduction)

Continuous speech recognition using hidden Markov models

IEEE Assp Magazine, 1990

Stochastic signal processing techniques have profoundly changed our perspective on speech processing. We have witnessed a progression from heuristic algorithms to detailed statistical approaches based on iterative analysis techniques. Markov modeling provides a mathematically rigorous approach to developing robust statistical signal models. Since t h e i n t r o d u c t i o n of Markov models t o speech processing in t h e middle 1970s. continuous speech recognition technology has come of age. Dramatic advances have been made in characterizing the temporal and spectral evolution of the speech signal. A t the same time, our appreciation o f t h e need t o explain complex acoustic manifestations b y integration of application constraints into low level signal processing has grown. In this paper, w e review the use of Markov models in continuous speech recognition. Markov models are presented as a generalization of i t s predecessor technology, Dynamic Programming. A unified view is offered in which b o t h linguistic decoding and acoustic matching are integrated into a single optimal network search framework.

A Feature Based Classification and Analysis of Hidden Markov Model in Speech Recognition

Cyber Intelligence and Information Retrieval

Speech recognition is to change over the acoustical sign got from a spokesman or a phone which produces a lot of words. Speech recognition can also regardless called computer speech cognizance which means making the digital device understand what we are talking about. It helps users direct their systems for work, and avoids typing their work because a system can write words faster than a human being. There are several Hmm based models have been developed by the researchers for speech recognition but due to daily advancement of the technology landscape still need a robust techniques in the field of speech recognition. Due to its significant ability, it get classified and with the help of them there are several speech recognition techniques has been developed. The comparative analysis of the various Hmm model shows their efficiency and proposed the effective model in field of speech recognition.

Continuous Density Hidden Markov Model for Hindi Speech Recognition

State of the art automatic speech recognition system uses Mel frequency cepstral coefficients as feature extractor along with Gaussian mixture model for acoustic modeling but there is no standard value to assign number of mixture component in speech recognition process.Current choice of mixture component is arbitrary with little justification. Also the standard set for European languages can not be used in Hindi speech recognition due to mismatch in database size of the languages.The parameter estimation with too many or few component may inappropriately estimate the mixture model. Therefore, number of mixture is important for initial estimation of expectation maximization process. In this research work, we estimate number of Gaussian mixture component for Hindi database based upon the size of vocabulary.Mel frequency cepstral feature and perceptual linear predictive feature along with its extended variations with delta-delta-delta feature have been used to evaluate this number based on optimal recognition score of the system . Comparitive analysis of recognition performance for both the feature extraction methods on medium size Hindi database is also presented in this paper.HLDA has been used as feature reduction technique and also its impact on the recognition score has been highlighted here.

Large Vocabulary in Continuous Speech Recognition Using HMM and Normal Fit

this paper addresses the problem of large vocabulary speaker independent continuous speech recognition using the phonemes, Hidden Markov Model (HMM) and Normal fit method. Here we first detect for the voiced part in speech signal through computing dynamic threshold in each frame. Real Cepstrum coefficients are extracted as features from the voiced frames. The Baum-Welch algorithm is applied for training those features. Then normal fit technique is applied, the outputted values are labelled using correspondent phoneme or syllable. The model is tested for 5 languages namely English, Kannada, Hindi, Tamil and Telugu. The automatic segmentation of speech signals average accuracy rate is 95.42% and miss rate of about 4.58%. In the large vocabulary, average Word Recognition Rate (WRR) is 85.16% and average Word Error Rate (WER) is 14.84%. All computations are done using mat lab.