A Comparative Study of Feature Extraction Techniques for Speech Recognition System (original) (raw)

DIFFERENT FEATURE EXTRACTION TECHNIQUES FOR AUTOMATIC SPEECH RECOGNITION: A REVIEW

Automatic speech recognition, which allows a usual and user-friendly communication technique among individual and device, is a dynamic research area. The speech recognition is the skill to pay attention to what we are talking about, to interpret and to perform actions based on the information spoken. This article presents a short outline of speech recognition and the various techniques like MFCC, LPC and PLP intended for feature extraction in speech recognition system. Among the three techniques i.e. MFCC, LPC, PLP, Mel frequency cepstral coefficient's (MFCC) is repeatedly used feature extraction technique in speech recognition process because it is most nearby to the real individual acoustic speech opinion.

Empirical Review Paper on Voice Recognition & Feature Extraction Techniques

Speech recognition is the process of automatically recognizing the spoken words of person based on information content in speech signal. Many reviews and surveys have been conducted on voice feature extraction techniques but most of them have not done an exhaustive empirical review on the techniques. This paper provides an empirical review with relevant algorithmic calculations on each of feature extraction techniques for voice recognition and discusses the techniques and systems that make it possible for computers to accept Voice as input. This paper shows the major developments in the field of voice analytics. It gives a detailed information of the three main feature extraction techniques: Linear Predictive Coding (LPC), Mel-frequency cepstrum coefficient (MFCCs) and RASTA filtering technique. The objective of this paper is to summarize the feature extractions techniques used in speech recognition system and provide an empirical value to each technique. The words " voice " and " speech " are used interchangeably in this context.

Feature Extraction Techniques in Speech Processing: A Survey

International Journal of Computer Applications, 2014

Speech processing includes the various techniques such as speech coding, speech synthesis, speech recognition and speaker recognition. In the area of digital signal processing, speech processing has versatile applications so it is still an intensive field of research. Speech processing mostly performs two fundamental operations such as Feature Extraction and Classification. The main criterion for the good speech processing system is the selection of feature extraction technique which plays an important role in the system accuracy. This paper intends to focus on the survey of various feature extraction techniques in speech processing such as Fast Fourier Transforms, Linear Predictive Coding, Mel Frequency Cepstral Coefficients, Discrete Wavelet Transforms, Wavelet Packet Transforms, Hybrid Algorithm DWPD and their applications in speech processing.

International Journal of Innovative Research in Computer and Communication Engineering Speech Recognition System with Different Methods of Feature Extraction

The paper presents the design of speech recognition system that uses preprocessing, feature extraction and classification stages. In preprocessing stage a de-noising is done to get the speech data without noise. In feature extraction stage Linear Predictive Coding (LPC), Mel Frequency Cepstral Coefficients (MFCC), and Spectrogram methods are used to extract the features of the word. Neural Networks (NN) was used to classify the spoken words to different patterns so the system can recognize unknown spoken words according to these patterns. The set of spoken words are used in simulation of the system. The comparative results of the system have been provided using above mentioned feature extraction methods.

TECHNIQUES FOR FEATURE EXTRACTION IN SPEECH RECOGNITION SYSTEM : A COMPARATIVE STUDY

The time domain waveform of a speech signal carries all of the auditory information. From the phonological point of view, very little can be said on the basis of the waveform itself. However, past research in mathematics, acoustics, and speech technology have provided many methods for converting data, that can be considered as information if interpreted correctly. In order to find some statistically relevant information from incoming data, it is important to have mechanisms for reducing the information of each segment in the audio signal into a relatively small number of parameters, or features. These features should describe each segment in such a characteristic way that other similar segments can be grouped together by comparing their features. There are enormous interesting and exceptional ways to describe the speech signal in terms of parameters. Though, they all have their strengths and weaknesses, we have presented some of the most used methods with their importance.

Some Commonly Used Speech Feature Extraction Algorithms

From Natural to Artificial Intelligence - Algorithms and Applications, 2018

Speech is a complex naturally acquired human motor ability. It is characterized in adults with the production of about 14 different sounds per second via the harmonized actions of roughly 100 muscles. Speaker recognition is the capability of a software or hardware to receive speech signal, identify the speaker present in the speech signal and recognize the speaker afterwards. Feature extraction is accomplished by changing the speech waveform to a form of parametric representation at a relatively minimized data rate for subsequent processing and analysis. Therefore, acceptable classification is derived from excellent and quality features. Mel Frequency Cepstral Coefficients (MFCC), Linear Prediction Coefficients (LPC), Linear Prediction Cepstral Coefficients (LPCC), Line Spectral Frequencies (LSF), Discrete Wavelet Transform (DWT) and Perceptual Linear Prediction (PLP) are the speech feature extraction techniques that were discussed in these chapter. These methods have been tested in a wide variety of applications, giving them high level of reliability and acceptability. Researchers have made several modifications to the above discussed techniques to make them less susceptible to noise, more robust and consume less time. In conclusion, none of the methods is superior to the other, the area of application would determine which method to select.

Speech Recognition System with Different Methods of Feature Extraction

International Journal of Innovative Research in Computer and Communication Engineering, 2018

The paper presents the design of speech recognition system that uses preprocessing, feature extraction and classification stages. In preprocessing stage a de-noising is done to get the speech data without noise. In feature extraction stage Linear Predictive Coding (LPC), Mel Frequency Cepstral Coefficients (MFCC), and Spectrogram methods are used to extract the features of the word. Neural Networks (NN) was used to classify the spoken words to different patterns so the system can recognize unknown spoken words according to these patterns. The set of spoken words are used in simulation of the system. The comparative results of the system have been provided using above mentioned feature extraction methods.

Comparative Analysis of Methods Used to Extract Speech Signal Features

IJCSMC, 2021

The stage of extracting the features of the speech file is one of the most important stages of building a system for identifying a person through the use of his voice. Accordingly, the choice of the method of extracting speech features is an important process because of its subsequent negative or positive effects on the speech recognition system. In this paper research we will analyze the most popular methods of speech signal features extraction: LPC, Kmeans clustering, WPT decomposition and MLBP methods. These methods will be implemented and tested using various speech files. The amplitude and sampling frequency will be changed to see the affects of changing on the extracted features. Depending on the results of analysis some recommendations will be given.

Feature Extraction Methods LPC, PLP and MFCC

The automatic recognition of speech, enabling a natural and easy to use method of communication between human and machine, is an active area of research. Speech processing has vast application in voice dialing, telephone communication, call routing, domestic appliances control, Speech to text conversion, text to speech conversion, lip synchronization, automation systems etc. Nowadays, Speech processing has been evolved as novel approach of security. Feature vectors of authorized users are stored in database. Speech features are extracted from recorded speech of a male or female speaker and compared with templates available in database. Speech can be parameterized by Linear Predictive Codes (LPC), Perceptual Linear Prediction (PLP), Mel Frequency Ce pstral Coefficients (MFCC) PLP-RASTA (PLP-Relative Spectra) etc. Some parameters like PLP and MFCC considers the nature of speech while it extracts the features, while LPC predicts the future features based on previous features. Training models like neural network are trained for feature vector to predict the unknown sample. Techniques like Vector Quantization (VQ), Dynamic Time Warping (DTW), Support Vector Machine (SVM), and Hidden Markov Model (HMM) can be used for classification and recognition. We have described neural network in our paper with LPC, PLP and MFCC parameters.

Speech Feature Extraction and Matching Technique

2016

The ultimate goal of the present investigation is to study the speech coding techniques for better understanding the natural spoken language considering the obvious constraints such as speaker dependency, isolated words, limited vocabulary and artificial grammar. The speech communication technology between human and computer is experiencing a revolutionary progress in the information industry. For analysis, synthesis, coding and recognition purpose the speech signals have to be converted into the digital form. The speech signals are continuous time and amplitude waveforms, which are then sampled and quantized. In the present work, two speech coding techniques have been used, the linear predictive coding technique for feature extraction. Dynamic Time Warping is a cost minimization matching technique, in which a test signal is stretched or compressed according to a reference template.