Analysis of Feature Extraction Methods for Speaker Dependent Speech Recognition (original) (raw)
Related papers
A review on feature extraction techniques for speaker dependent voice recognition
Speaker dependent mode is widely used technique to recognize the voice. Accuracy, low error rate, and easy to build are the key features that makes the speaker dependent mode more comfortable and therefore is widely used to recognize the voice. The feature extraction technique plays key role in voice recognition. A carefully chosen technique will improve the efficiency to recognize the word. This paper contains different available technique of feature extraction will conclude the better technique.
DIFFERENT FEATURE EXTRACTION TECHNIQUES FOR AUTOMATIC SPEECH RECOGNITION: A REVIEW
Automatic speech recognition, which allows a usual and user-friendly communication technique among individual and device, is a dynamic research area. The speech recognition is the skill to pay attention to what we are talking about, to interpret and to perform actions based on the information spoken. This article presents a short outline of speech recognition and the various techniques like MFCC, LPC and PLP intended for feature extraction in speech recognition system. Among the three techniques i.e. MFCC, LPC, PLP, Mel frequency cepstral coefficient's (MFCC) is repeatedly used feature extraction technique in speech recognition process because it is most nearby to the real individual acoustic speech opinion.
Empirical Review Paper on Voice Recognition & Feature Extraction Techniques
Speech recognition is the process of automatically recognizing the spoken words of person based on information content in speech signal. Many reviews and surveys have been conducted on voice feature extraction techniques but most of them have not done an exhaustive empirical review on the techniques. This paper provides an empirical review with relevant algorithmic calculations on each of feature extraction techniques for voice recognition and discusses the techniques and systems that make it possible for computers to accept Voice as input. This paper shows the major developments in the field of voice analytics. It gives a detailed information of the three main feature extraction techniques: Linear Predictive Coding (LPC), Mel-frequency cepstrum coefficient (MFCCs) and RASTA filtering technique. The objective of this paper is to summarize the feature extractions techniques used in speech recognition system and provide an empirical value to each technique. The words " voice " and " speech " are used interchangeably in this context.
A Comparative Study of Feature Extraction Techniques for Speech Recognition System
The automatic recognition of speech means enabling a natural and easy mode of communication between human and machine. Speech processing has vast applications in voice dialing, telephone communication, call routing, domestic appliances control, Speech to Text conversion, Text to Speech conversion, lip synchronization, automation systems etc. Here we have discussed some mostly used feature extraction techniques like Mel frequency Cepstral Co-efficient (MFCC), Linear Predictive Coding (LPC) Analysis, Dynamic Time Wrapping (DTW), Relative Spectra Processing (RASTA) and Zero Crossings with Peak Amplitudes (ZCPA).Some parameters like RASTA and MFCC considers the nature of speech while it extracts the features, while LPC predicts the future features based on previous features.
Speaker Independent Speech Recognition Using Maximum Likelihood Approach for Isolated Words
2017
Speech is an intuitive interface for man machine interaction. Minimizing word error rate is a unique challenge to develop Automatic Speech Recognition (ASR) system. Performance of this system is far from perfect. Acoustic model and language models are fundamentals to build robust ASR engine. This paper presents a stochastic procedure for developing phoneme and word level acoustic models. Acoustic features estimated by Mel Frequency Cepstral Coefficients (MFCC) with 35% of overlapping of frames for every 25 milliseconds of a signal. The paper compares and highlights the word and phoneme level acoustic model performances for Kannada language vocabulary. The performance of the system is recorded for different vocabulary sizes, and word error rate (WER) computed for phoneme and word acoustic models. The system presents accuracy of 94.78046% and 97.6% for word and phoneme acoustic models respectively for the vocabulary 90 words. In addition, 98.08% of recognition rate for the vocabulary ...
Speech Recognition System with Different Methods of Feature Extraction
International Journal of Innovative Research in Computer and Communication Engineering, 2018
The paper presents the design of speech recognition system that uses preprocessing, feature extraction and classification stages. In preprocessing stage a de-noising is done to get the speech data without noise. In feature extraction stage Linear Predictive Coding (LPC), Mel Frequency Cepstral Coefficients (MFCC), and Spectrogram methods are used to extract the features of the word. Neural Networks (NN) was used to classify the spoken words to different patterns so the system can recognize unknown spoken words according to these patterns. The set of spoken words are used in simulation of the system. The comparative results of the system have been provided using above mentioned feature extraction methods.
Some Commonly Used Speech Feature Extraction Algorithms
From Natural to Artificial Intelligence - Algorithms and Applications, 2018
Speech is a complex naturally acquired human motor ability. It is characterized in adults with the production of about 14 different sounds per second via the harmonized actions of roughly 100 muscles. Speaker recognition is the capability of a software or hardware to receive speech signal, identify the speaker present in the speech signal and recognize the speaker afterwards. Feature extraction is accomplished by changing the speech waveform to a form of parametric representation at a relatively minimized data rate for subsequent processing and analysis. Therefore, acceptable classification is derived from excellent and quality features. Mel Frequency Cepstral Coefficients (MFCC), Linear Prediction Coefficients (LPC), Linear Prediction Cepstral Coefficients (LPCC), Line Spectral Frequencies (LSF), Discrete Wavelet Transform (DWT) and Perceptual Linear Prediction (PLP) are the speech feature extraction techniques that were discussed in these chapter. These methods have been tested in a wide variety of applications, giving them high level of reliability and acceptability. Researchers have made several modifications to the above discussed techniques to make them less susceptible to noise, more robust and consume less time. In conclusion, none of the methods is superior to the other, the area of application would determine which method to select.