Mel Frequency Cepstral Coefficient Research Papers (original) (raw)

An investigation into the feature extraction and selection of infant cry with asphyxia is presented in this paper. The feature of the cry signal was extracted using mel frequency cepstrum coefficient (MFCC) analysis and the significant... more

An investigation into the feature extraction and selection of infant cry with asphyxia is presented in this paper. The feature of the cry signal was extracted using mel frequency cepstrum coefficient (MFCC) analysis and the significant coefficients were selected using orthogonal least square (OLS) algorithm. The effect of varying the number of MFCC filter banks on the feature selection was examined. It was found that the best set of coefficients could be achieved when 40 filter banks were used.

In this paper, we present an approach based on convolutional neural networks to build an automatic speech recognition system for the Amazigh language. This system is built with TensorFlow and uses mel frequency cepstral coefficient (MFCC)... more

In this paper, we present an approach based on convolutional neural networks to build an automatic speech recognition system for the Amazigh language. This system is built with TensorFlow and uses mel frequency cepstral coefficient (MFCC) to extract features. In order to test the effect of the speaker's gender and age on the accuracy of the model, the system was trained and tested on several datasets. The first experiment the dataset consists of 9240 audio files. The second experiment the dataset consists of 9240 audio files distributed between females and males' speakers. The last experiment 3 the dataset consists of 13860 audio files distributed between age 9-15, age 16-30, and age 30+. The result shows that the model trained on a dataset of adult speaker's age +30 categories generates the best accuracy with 93.9%.

We present a new approach for classifying mpeg-2 video sequences as 'cartoon' or 'non-cartoon' by analyzing specific video and audio features of consecutive frames in real-time. This is part of the well-known video-genre-... more

We present a new approach for classifying mpeg-2 video sequences as 'cartoon' or 'non-cartoon' by analyzing specific video and audio features of consecutive frames in real-time. This is part of the well-known video-genre- classification problem, where popular TV-broadcast genres like cartoon, commercial, music, news and sports are studied. Such applications have also been discussed in the context of MPEG-7 (12).

In Islamic religion, mistakes in recitation of holy Quran (the sacred book of Muslims) are forbidden. Mistakes can be missing words, verse, misreading Harakat (pronunciations, punctuations, and accents). Thus, a hafiz/reciter who... more

In Islamic religion, mistakes in recitation of holy Quran (the sacred book of Muslims) are forbidden. Mistakes can be missing words, verse, misreading Harakat (pronunciations, punctuations, and accents). Thus, a hafiz/reciter who memorizes the holy Quran, needs other hafiz/tutor who listens the recitation and points oral mistakes. Due to the seriously commitment, the availability and expertise of a hafiz are also questionable. A listener can also make mistakes while hearing imputable to environmental interruptions like noise, attention. In order to tackle this issue, we designed, developed, and tested the E-hafiz system. E-hafiz is based on Mel-Frequency Cepstral Coefficient (MFCC) technique to extract voice features from Quranic verse recitation and maps them with the data collected during the training phase. Any mismatch mistake is pointed out. Testing results of short verses of Quran using the E-hafiz system are very encouraging.

In this work, Classical Turkish Music songs are classified into six makams. Makam is a modal framework for melodic development in Classical Turkish Music. The effect of the sound clip length on the system performance was also evaluated.... more

In this work, Classical Turkish Music songs are classified into six makams. Makam is a modal framework for melodic development in Classical Turkish Music. The effect of the sound clip length on the system performance was also evaluated. The Mel Frequency Cepstral Coefficients (MFCC) were used as features. Obtained data were classified by using Probabilistic Neural Network. The best correct recognition ratio was obtained as 89,4% by using a clip length of 6 s.

Age estimation based on human's speech features is an interesting subject in Automatic Speech Recognition (ASR) systems. There are some works in literature on speaker age estimation but it needs more new works especially for Persian... more

Age estimation based on human's speech features is an interesting subject in Automatic Speech Recognition (ASR) systems. There are some works in literature on speaker age estimation but it needs more new works especially for Persian speakers. In age estimation, like other speech processing systems, we encounter with two main challenges: finding an appropriate procedure for feature extraction, and selecting a reliable method for pattern classification. In this paper we propose an automatic age estimation system for classification of 6 age groups of various Persian speaker people. Perceptual Linear Predictive (PLP) and Mel-Frequency Cepstral Coefficients (MFCC) are extracted as speech features and SVM is utilized for classification procedure. Furthermore the effects of variations in parameter of kernel function, time of frame length in sampling process, the number of MFCC coefficients, and the order of PLP on system efficiency has been evaluated, and the results has been compared.

Digital processing of speech signal and voice recognition algorithm is very important for fast and accurate automatic voice recognition technology. The voice is a signal of infinite information. A direct analysis and synthesizing the... more

Digital processing of speech signal and voice recognition algorithm is very important for fast and accurate automatic voice recognition technology. The voice is a signal of infinite information. A direct analysis and synthesizing the complex voice signal is due to too much information contained in the signal. Therefore the digital signal processes such as Feature Extraction and Feature Matching are

Results from preliminary research on recognition of Polish birds’ species are presented in the paper. Bird voices were recorded in a highly noised municipal environment. High 96 kHz sampling frequency has been used. As a feature set... more

Results from preliminary research on recognition of Polish birds’ species are presented in the paper. Bird voices were recorded in a highly noised municipal environment. High 96 kHz sampling frequency has been used. As a feature set standard mel-frequency cepstral coefficients (MFCC) and recently proposed human-factor cepstral coefficients (HFCC) parameters were selected. Superior performance of the HFCC features over MFCC ones has been observed. Proper limiting of the maximal frequency during HFCC feature extraction results in increasing accuracy of birds’ species recognition. Good initial results are very promising for practical application of the methods described in the paper in monitoring of protected birds’ area.

Almost all current automatic speech recognition (ASR) systems conventionally append delta and double-delta cepstral features to static cepstral features. In this work we describe a modified feature-extraction procedure in which the... more

Almost all current automatic speech recognition (ASR) systems conventionally append delta and double-delta cepstral features to static cepstral features. In this work we describe a modified feature-extraction procedure in which the time-difference operation is performed in the spectral domain, rather than the cepstral domain as is generally presently done. We argue that this approach based on "delta-spectral" features is needed because even though delta-cepstral features capture dynamic speech information and generally greatly improve ASR recognition accuracy, they are not robust to noise and reverberation. We support the validity of the delta-spectral approach both with observations about the modulation spectrum of speech and noise, and with objective experiments that document the benefit that the delta-spectral approach brings to a variety of currently popular feature extraction algorithms. We found that the use of delta-spectral features, rather than the more traditional delta-cepstral features, improves the effective SNR by between 5 and 8 dB for background music and white noise, and recognition accuracy in reverberant environments is improved as well.