Thai Connected Digit Speech Recognition Using Hidden Markov Models (original) (raw)

Isolated to Connected Tamil Digit Speech Recognition System Based on Hidden Markov Model

Speech recognition technology has improved with time to enhanced Human Computer Interaction (HCI).This paper proposed a system for isolated to connected Tamil digit speech recognition system using CMU Sphinx tools. The connected speech recognition important in many application such as voice-dialling telephone, automated banking system automated data entry, pin entry etc. the proposed system is tri phone based, small vocabularies, speaker specific and speaker-independent. The most powerful Mel Frequency Cepstral Coefficient (MFCC) feature extraction techniques are used to train the acoustic feature of speech database. The probabilistic Hidden Markov Model (HMM) is used to model the speech utterance. And the Viterbi beam search algorithm is used in decoding process. The system tested with random digit (0 to 100) in a various condition shows optimum result 96.7% recognition rates for speaker specific and 54.5% recognition rate for speaker independent in connected word recognition. We use CMU sphinx speech recognition tools to construction of speech recognizer.

Voice pattern recognition using Mel-Frequency Cepstral Coefficient and Hidden Markov Model for bahasa Madura

Journal of Physics: Conference Series, 2019

Voice recognition is one part of an application that allows a device to recognize spoken words by digitizing words and matching digital signals with a particular pattern stored in a device. Spoken words are converted into digital signals by converting voice waves into a set of numbers which is then compared with the voice pattern to identify the words. MFCC can be an alternative method to solve the problem of voice extraction because this method is reliable for recognizing the unique features of human voice. Hidden Markov Model is used to recognize the voice pattern, so it can be used to compare the voice signal obtained from e-learning with the trained voice signal. Bahasa Madura is a regional language used by ethnic Madurese to communicate daily. Currently the number of Madurese people who understand this language is reduced so that the use of Bahasa Madura is also reduced. Therefore, it is necessary to conduct speech recognition research in Madura Language as one of effort to pre...

Analysis of Speech Recognition Techniques on the Hindi Speech Digits Database

Isolated spoken Hindi digits recognition performances have been evaluated using Matlab and HTK (Hidden Markov Model Toolkit). In both cases MFCC (Mel Frequency Cepstral Coefficient) and HMM (Hidden Markov Model) has been used as feature extraction technique and classifier respectively. Experiments were performed for both clean as well as noisy data. In this experiment car noise, F16 noise, Factory noise and Speech noises have been added to clean signal to make noisy signal at different SNR levels. The recognition performance of isolated Hindi digits with HTK was better than with Matlab for both noisy and clean environments.

Realization of Hidden Markov Model for English Digit Recognition

International Journal of Computer Applications, 2014

The objective of the work described here is to compare the isolated English language digit speech recognition using Hidden Markov Model for speaker independent system. Two different datasets were collected of audio recordings for the said comparison of isolated digits of English language. Speakers here read numeric digits 0 to 9 i.e. ZERO to NINE. One corpus is self recorded signals and other is standard CUAVE dataset (36 speakers, each uttered 10 words). The training and testing samples are separated for speaker dependent and speaker independent systems. The system has been implemented using the HMM toolkit i.e. HTK by training HMMs of the words making the vocabulary on the training data. Different HMMs for individual digits have been initialized and trained to have well modeled structure. The trained system was tested on training data as well as test data and results shown that most of the speech samples were correctly recognized. The system was tested for speaker independent and dependent way, to check the changes in the recognition rate. Further this can be used by developers and researchers interested in speech recognition for English language not only for isolated digits but also for other words of English language. If clean database is available, further this can be generalized to recognize words of any language. Continuous speech can also be recognized using study of this system.

Connected digit speech recognition system for Malayalam language

A connected digit speech recognition is important in many applications such as automated banking system, catalogue-dialing, automatic data entry, automated banking system, etc. This paper presents an optimum speaker-independent connected digit recognizer for Malayalam language. The system employs Perceptual Linear Predictive (PLP) cepstral coefficient for speech parameterization and continuous density Hidden Markov Model (HMM) in the recognition process. Viterbi algorithm is used for decoding. The training data base has the utterance of 21 speakers from the age group of 20 to 40 years and the sound is recorded in the normal office environment where each speaker is asked to read 20 set of continuous digits. The system obtained an accuracy of 99.5 % with the unseen data.

Continuous Density Hidden Markov Model for Hindi Speech Recognition

State of the art automatic speech recognition system uses Mel frequency cepstral coefficients as feature extractor along with Gaussian mixture model for acoustic modeling but there is no standard value to assign number of mixture component in speech recognition process.Current choice of mixture component is arbitrary with little justification. Also the standard set for European languages can not be used in Hindi speech recognition due to mismatch in database size of the languages.The parameter estimation with too many or few component may inappropriately estimate the mixture model. Therefore, number of mixture is important for initial estimation of expectation maximization process. In this research work, we estimate number of Gaussian mixture component for Hindi database based upon the size of vocabulary.Mel frequency cepstral feature and perceptual linear predictive feature along with its extended variations with delta-delta-delta feature have been used to evaluate this number based on optimal recognition score of the system . Comparitive analysis of recognition performance for both the feature extraction methods on medium size Hindi database is also presented in this paper.HLDA has been used as feature reduction technique and also its impact on the recognition score has been highlighted here.

A Hidden Markov Model-Based Speech Recognition System Using Baum-Welch, Forward-Backward and Viterbi Algorithms

Speech is the most complex part or component of human intelligence and for that matter speech signal processing is very important. The variability of speech is very high, and this makes speech recognition difficult. Other factors like dialects, speech duration, context dependency, different speech speed, speaker differentiation, environment and locality all add to the difficulty in speech processing. The absence of distinct boundaries between tones or words causes additional problems. Speech has speaker dependent characteristics, so that no one can reproduce or repeat phrases in the same way as another. Nevertheless, a speech recognition system should be able to model and recognize the same words and phrases absolutely. Digital signal processors (DSP) are often used in speech signal processing systems to control these complexities. This paper presents a Hidden Markov Model (HMM) based speech signal modulation through the application of the Baum-Welch, Forward-Backward and Viterbi algorithms. The system was implemented using a 16-bit floating point DSP (TMS320C6701) from Texas instruments and the vocabulary was trained using the Microsoft Hidden Markov Model Toolkit (HTK). The proposed system achieved about 79% correct word recognition which represents approximately 11,804 correct words recognized out of a total of 14960 words provided. This result indicates that the proposed model accuracy and speaker independent system has a very good evaluation score, and thus can be used to aid dictation for speech impaired persons and applications in real time with a 10 ms data exchange rate.

Performance analysis of isolated Bangla speech recognition system using Hidden Markov Model.

here we present a model of isolated speech recognition (ISR) system for Bangla character set and analysis the performance of that recognizer model. In this isolated Bangla speech recognition is implemented by the combining MFCC as feature extraction for the input audio file and used Hidden Markov Model (HMM) for training & recognition due to HMMs uncomplicated and effective framework for modeling time-varying sequence of spectral feature vector. A series of experiments have been performed with 10-talkers (5 male and 5 female) by 56 Bangla characters (include, Bangla vowel, Bangla consonant, Bangla

Normal and Whispered Speech Recognition Systems for Myanmar Digits

International Journal of Science and Engineering Applications, 2018

Nowadays, Automatic speech recognition (ASR) technology comes as the popular innovation in human machine interaction. This technology allows a computer to recognize the spoken words and convert them to text data. In designing the computer systems that recognize spoken words, one of the challenging tasks is to be recognized spoken Myanmar digits. In this paper we focus on recognizing Myanmar digits spoken by normal voice and whispered voice. Myanmar digits recognition system for both types has been developed by using Hidden Markov Model in HTK tools and Mel Frequency Cepstral Coefficients (MFCC) technique has been used to convert the speech waveform into a set of feature vectors for recognizing the vocalization of a word. In our experiments, HMMbased acoustic and language models are used to evaluate the performance of speech recognizer for both speaker dependent and speaker independent. According to the experimental results, the performance of speaker dependent speech recognition system for normal voice and whispered voice are 90% and 88.7% respectively. The performance of speaker independent speech recognition system for normal voice and whispered voice are 67.3% and 65.7% respectively. We found that the performance of both type of speaker dependent is higher than those of speaker independent.

Speech Recognition Using Hidden Markov Model Algorithm

Speech recognition applications are becoming more useful nowadays. With growth in the needs for embedded computing and the demand for emerging embedded platforms, it is required that speech recognition systems are available but speech recognition software being closed source cannot be used easily for implementation of speech recognition based devices. Aim To implement English words speech recognition system using Matlab (GUI). This work is based on Hidden Markov Model, which provides a highly reliable way for recognizing speech. Training data such as words like go up, go right, open, close etc. records in audacity open source; the system will test it with data record and display it in edit text box.