Mel Cepstrum Research Papers - Academia.edu (original) (raw)

Speech has evolved as a primary form of communication between humans. The advent of digital technology, gave us highly versatile digital processors with high speed, low cost and high power which enable researchers to transform the analog speech signals in to digital speech signals that can be scientifically studied. Achieving higher recognition accuracy, low word error rate and addressing the issues of sources of variability are the major considerations for developing an efficient Automatic Speech Recognition system. In speech recognition, feature extraction requires much attention because recognition performance depends heavily on this phase. In this paper, an effort has been made to highlight the progress made so far in the feature extraction phase of speech recognition system and an overview of technological perspective of an Automatic Speech Recognition system are discussed.

- by
- •
- Speech Recognition, MFCC, LINEAR PREDICTIVE CODING, Mel Cepstrum

In this paper, the use of new auditory-based features derived from cochlear filters, have been proposed for classification of unvoiced fricatives. Classification attempts have been made to classify sibilant (i.e., /s/, /sh/) vs. non-sibilants (i.e., /f/, /th/) as well as for fricatives within each sub-category (i.e., intra-sibilants and intra-non-sibilants). Our experimental results indicate that proposed feature set, viz., Cochlear Filter-based Cepstral Coefficients (CFCC) performs better for individual fricative classification (i.e., a jump of 3.41 % in average classification accuracy and a fall of 6.59 % in EER) in clean conditions than the state-of-the-art feature set, viz., Mel Frequency Cepstral Coefficients (MFCC). Furthermore, under signal degradation conditions (i.e., by additive white noise) classification accuracy using proposed feature set drops much slowly (i.e., from 86.73 % in clean conditions to 77.46 % at SNR of 5 dB) than by using MFCC (i.e., from 82.18 % in clean conditions to 46.93 % at SNR of 5 dB).

This paper gives a novel approach of automatic speaker recognition technology, with an emphasis on text-dependent speaker recognition. Speaker recognition has been studied actively for several decades. In fact, Speaker recognition system may be viewed as working in four stages, namely, analysis, feature extraction, modeling and testing. After some preprocessing modules, we apply MFCC, as one of the most important feature extraction methods in this field of works, to speech signals independently in order to extract feature vectors. Afterwards, obtained vectors are used by training system to find codewords for ten users in our Persian database by LBG VQ. Finally, we use DTW technique for recognizing a speaker among all. Our experience strongly indicates that the identification rate over 96% can be achieved by the proposed algorithm.

- by Mehdi Bahaghighat
- •
- Persian Language, Speaker Verification, DWT, Vector Quantization

- by Farshid Sahba
- •
- Computer Science, Machine Learning, Speaker Recognition, Persian Language