Speech Feature Extraction and Matching Technique (original) (raw)

Speech Recognition with Dynamic Time Warping using MATLAB

Speech recognition has found its application on various aspects of our daily lives from automatic phone answering service to dictating text and issuing voice commands to computers. In this paper, we present the historical background and technological advances in speech recognition technology over the past few decades. More importantly, we present the steps involved in the design of a speaker-independent speech recognition system. We focus mainly on the pre-processing stage that extracts salient features of a speech signal and a technique called Dynamic Time Warping commonly used to compare the feature vectors of speech signals. These techniques are applied for recognition of isolated as well as connected words spoken. We conduct experiments on MATLAB to verify these techniques. Finally, we design a simple 'Voiceto-Text' converter application using MATLAB.

Speech Recognition using Dynamic Time Warping, Hidden Markov Model and Artificial Neural Networks.pdf

In this paper, an advanced method is presented that's able to classify speech signals with the high accuracy at the minimum time. First, the recorded signal is preprocessed that this section includes denoising with Mels Frequency Cepstral Analysis and feature extraction using discrete wavelet transform coefficients; Then these features are fed to Multilayer Perceptron network for classification. Finally, after training of neural network effective features are selected with UTA algorithm.

Dynamic Time Warping for Speech Recognition with Training Part to Reduce the Computation

Journal of Signal Processing, 2014

In this paper, we proposed a dynamic time warping (DTW) method with a training part. DTW is a popular automatic speech recognition (ASR) method based on template matching. Conventional DTW is fast and of low complexity, however its recognition accuracy is limited. Recently, a DTW with multireferences (mDTW) algorithm has also been developed to improve the recognition accuracy to be comparable to that of the hidden Markov model (HMM) algorithm under noisy conditions. However the mDTW algorithm increases the calculation cost. Therefore, in order to reduce the calculation cost, in this paper, a training part will be added to the DTW-based ASR system, unlike the mDTW, which tries to find appropriate reference utterances to replace the increasing utterances. The results show that the average recognition accuracy of the proposed method is similar to that of the mDTW, and the calculation cost was reduced by 41.6%.

Five Stage Dynamic Time Warping Algorithm for Speaker Dependent Isolated Word Recognition in Speech

In speech recognition, a speaker dependent isolated word recognition system is used for small vocabulary in different applications for voice control systems. Dynamic Time Warping (DTW) algorithm is used for pattern matching when two sequences of unequal size are available. When test data and reference data or sequences are available of unequal in nature with time domain then existing DTW algorithm takes time more, while proposed solution will give the efficient algorithm which reduces the computation time without degradation of accuracy and efficiency.

Automatic Speech Recognition Using Template Model for Man-Machine Interface

Speech is a natural form of communication for human beings, and computers with the ability to understand speech and speak with a human voice are expected to contribute to the development of more natural man-machine interfaces. Computers with this kind of ability are gradually becoming a reality, through the evolution of speech recognition technologies. Speech being an important mode of interaction with computers. In this paper Feature extraction is implemented using well-known Mel-Frequency Cepstral Coefficients (MFCC).Pattern matching is done using Dynamic time warping (DTW) algorithm.

A Comparative Study of Feature Extraction Techniques for Speech Recognition System

The automatic recognition of speech means enabling a natural and easy mode of communication between human and machine. Speech processing has vast applications in voice dialing, telephone communication, call routing, domestic appliances control, Speech to Text conversion, Text to Speech conversion, lip synchronization, automation systems etc. Here we have discussed some mostly used feature extraction techniques like Mel frequency Cepstral Co-efficient (MFCC), Linear Predictive Coding (LPC) Analysis, Dynamic Time Wrapping (DTW), Relative Spectra Processing (RASTA) and Zero Crossings with Peak Amplitudes (ZCPA).Some parameters like RASTA and MFCC considers the nature of speech while it extracts the features, while LPC predicts the future features based on previous features.

Introduction to Various Algorithms of Speech Recognition: Hidden Markov Model, Dynamic Time Warping and Artificial Neural Networks

International Journal of Engineering Development and Research, 2014

Now a day’s speech recognition is used widely in many applications. In computer science and electrical engineering, speech recognition (SR) is the translation of spoken words into text. It is also known as "automatic speech recognition" (ASR), "computer speech recognition", or just "speech to text" (STT). A hidden Markov model (HMM) is a statistical Markov model in which the system being modelled is assumed to be a Markov process with unobserved (hidden) states. An HMM can be presented as the simplest dynamic Bayesian network. Dynamic time warping (DTW) is a well-known technique to find an optimal alignment between two given (time-dependent) sequences under certain restrictions intuitively; the sequences are warped in a nonlinear fashion to match each other. ANN is non-linear data driven self-adaptive approach. It can identify and learn co-related patterns between input dataset and corresponding target values. After training ANN can be used to predict t...

TECHNIQUES FOR FEATURE EXTRACTION IN SPEECH RECOGNITION SYSTEM : A COMPARATIVE STUDY

The time domain waveform of a speech signal carries all of the auditory information. From the phonological point of view, very little can be said on the basis of the waveform itself. However, past research in mathematics, acoustics, and speech technology have provided many methods for converting data, that can be considered as information if interpreted correctly. In order to find some statistically relevant information from incoming data, it is important to have mechanisms for reducing the information of each segment in the audio signal into a relatively small number of parameters, or features. These features should describe each segment in such a characteristic way that other similar segments can be grouped together by comparing their features. There are enormous interesting and exceptional ways to describe the speech signal in terms of parameters. Though, they all have their strengths and weaknesses, we have presented some of the most used methods with their importance.