Automatic, Text-Independent, Speaker Identification and Verification System Using Mel Cepstrum and GMM (original) (raw)

Text Independent Automatic Speaker Recognition System Using Mel-Frequency Cepstrum Coefficient and Gaussian Mixture Models

Journal of Information Security, 2012

The aim of this paper is to show the accuracy and time results of a text independent automatic speaker recognition (ASR) system, based on Mel-Frequency Cepstrum Coefficients (MFCC) and Gaussian Mixture Models (GMM), in order to develop a security control access gate. 450 speakers were randomly extracted from the Voxforge.org audio database, their utterances have been improved using spectral subtraction, then MFCC were extracted and these coefficients were statistically analyzed by GMM in order to build each profile. For each speaker two different speech files were used: the first one to build the profile database, the second one to test the system performance. The accuracy achieved by the proposed approach is greater than 96% and the time spent for a single test run, implemented in Matlab language, is about 2 seconds on a common PC.

Text Dependent Speaker Identification using Hidden Markchov Model and Mel Frequency Cepstrum Coefficient

International Journal of Computer Applications, 2014

Speaker identification is a biometric process. The objective of speaker identification is to extract, characterize and recognize the information about speaker identity. Speaker Recognition technology has recently been used in a vast number of commercial areas successfully such as in voice based biometrics; voice controlled appliances, security control for confidential information, remote access to computers and many more interesting areas. A speaker identification system has two phases which are the training phase and the testing phase. Feature extraction is the first step for each phase in speaker recognition. Many algorithms are used for feature extraction. In this work, the Mel Frequency Cepstrum Coefficient (MFCC) feature has been used for designing a text dependent speaker identification system. In the identification phase, the existing reference templates are compared with the unknown voice input. In this thesis, Hidden Markov Model (HMM) method is used as the training/recognition algorithm which makes the final decision about the specification of the speaker by comparing unknown features to all models in the database and selecting the best matching model. i, e. the highest scored model. The speaker who obtains the highest score is selected as the target speaker.

Text Independent Speaker Identification System for Access Control

Cornell University - arXiv, 2022

Even human intelligence system fails to offer 100% accuracy in identifying speeches from a specific individual. Machine intelligence is trying to mimic humans in speaker identification problems through various approaches to speech feature extraction and speech modeling techniques. This paper presents a text-independent speaker identification system that employs Mel Frequency Cepstral Coefficients (MFCC) for feature extraction and k-Nearest Neighbor (kNN) for classification. The maximum cross validation accuracy obtained was 60%. This will be improved upon in subsequent research.

Text Dependent Speaker Identification Using a Bayesian network and Mel Frequency Cepstrum Coefficient

Speaker identification is a biometric technique. The objective of automatic speaker recognition is to extract, characterize and recognize the information about speaker identity. Speaker Recognition technology has recently been used in large number of commercial areas successfully such as in voice based biometrics; voice controlled appliances, security control for confidential information, remote access to computers and many more interesting areas. A speaker identification system has two phases which are the training phase and the testing phase. Feature extraction is the first step for each phase in speaker recognition. Many algorithms are suggested by the researchers for feature extraction. In this work, the Mel Frequency Cepstrum Coefficient (MFCC) feature has been used for designing a text dependent speaker identification system. While, in the identification phase, the existing reference templates are compared with the unknown voice input. In this thesis, a Bayesian network is used as the training/recognition algorithm which makes the final decision about the specification of the speaker by comparing unknown features to all models in the database and selecting the best matching model. i, e. the highest scored model. The speaker who obtains the highest score is selected as the target speaker.

GMM Based Speaker Verification System

International Journal of Engineering Research and, 2015

Speaker verification system is a popular biometric system and is widely used in application such as speaker authentication. This paper presents a framework for speaker verification by preserving the speaker privacy. A small speech sample of a speaker is a private form of communication and contains information such as a message via words, gender, language being spoken, emotional state etc. In this, the verification system does not have a direct access to the voice input provided by the speaker. The system also provides privacy of the speaker model saved by the system and thus preventing it to verify the speaker elsewhere. By using Gaussian mixture model and features extracted from the speech signal we build a unique identity for each speaker enrolled within the system for later verification with privacy criteria.

An Automatic Speaker Recognition System

Lecture Notes in Computer Science, 2008

Abstract. Speaker Recognition is the process of identifying a speaker by analyzing spectral shape of the voice signal. This is done by extracting & matching the feature of voice signal. Mel-frequency Cepstrum Co-efficient (MFCC) is the feature extraction technique in which we will get ...

Text Independent Speaker Verification Using GMM

This paper presents the performance of a text independent speaker verification system using Gaussian Mixture Model (GMM) for the Brazilian Portuguese. The Gaussian compo-nents of the GMM statistically represent the spectral char-acteristics of the speaker, leading to an effective speaker recognition system. The main goal here is a detailed evalua-tion of the parameters used by the GMM such as the number of Gaussian mixtures, the amount of time for training and testing. Aiming at the definition of the best set of features for a reasonable response, this work helps the comprehen-sion of the model and gives insights for further investiga-tion. We have used 36 speakers in the experiments, all mod-eled with 15 mel-cepstral coefficients. For 32 Gaussians, 60 seconds of training, and 30 seconds of testing, the sys-tem has no failure for a reasonably clean speech signal. The results have shown that the higher the amount of time for training and testing, the better are the results for a give...

A NOVEL METHODOLOGY FOR SPEAKER VERIFICATION USING GMM

Internation journal of science and innovation engineering and technology, 2018

The speaker verification is a process of verifying the identity of the claimants. It performs one to one comparison between a newly input voice print and the voice print for the claimed identity that is stored in the database. In this paper, linear predictive coding coefficient has been used for formant detection. The peak frequencies in the frequency response of vocal tract are formants, which is being detected and compared for verification. Data base of twenty persons having five samples per person including male and female has been created for analysis of results. The System (Speaker verification) is usually employed as a "gatekeeper" in order to provide access to a secure system. These systems operate with the user's knowledge and typically require the user's cooperation. The developed system uses the MATLAB.

A Tutorial on Text-Independent Speaker Verification

Eurasip Journal on Advances in Signal Processing, 2004

This paper presents an overview of a state-of-the-art text-independent speaker verification system. First, an introduction proposes a modular scheme of the training and test phases of a speaker verification system. Then, the most commonly speech parameterization used in speaker verification, namely, cepstral analysis, is detailed. Gaussian mixture modeling, which is the speaker modeling technique used in most systems, is then explained. A few speaker modeling alternatives, namely, neural networks and support vector machines, are mentioned. Normalization of scores is then explained, as this is a very important step to deal with real-world data. The evaluation of a speaker verification system is then detailed, and the detection error trade-off (DET) curve is explained. Several extensions of speaker verification are then enumerated, including speaker tracking and segmentation by speakers. Then, some applications of speaker verification are proposed, including on-site applications, remote applications, applications relative to structuring audio information, and games. Issues concerning the forensic area are then recalled, as we believe it is very important to inform people about the actual performance and limitations of speaker verification systems. This paper concludes by giving a few research trends in speaker verification for the next couple of years.

Mel frequency Cepstrum Coefficients and Enhanced LBG algorithm for Speaker Recognition

In this paper, an improved strategy for automated text dependent speaker recognition system has been proposed in noisy environment. The preprocessing of speaker signal started with eliminate the background noise. The next step is signal filtering and features extraction using cepstrum coefficients method, this extracted features can be used to by the enhanced LBG for vector quantization algorithm for speaker recognition, such that the specified speaker can be determined by matching the speaker to be tested with in stored codebook in database. And finally select correct speaker that have the lesser Euclidean distance. The speech feature extraction was based on a dataset of 175 different samples collected from 25 different speakers The results of the proposed system approved with good recognition ratio of speaker identification with maximum accuracy about 96.2% for database with close set of selected words contains the most used phonemes. Also the results of experiments show that recognition accuracy increased with frames overlapping. [Hussein Lafta Attiya, Ali Yakoob Yousif. Mel frequency Cepstrum Coefficients and Enhanced LBG algorithm for Speaker Recognition. Researcher 2015;7(1):19-25]. (ISSN: 1553-9865). http://www.sciencepub.net/researcher. 4