Automatic, Text-Independent, Speaker Identification and Verification System Using Mel Cepstrum and GMM (original) (raw)

Lecture Notes in Computer Science, 2008

Abstract. Speaker Recognition is the process of identifying a speaker by analyzing spectral shape of the voice signal. This is done by extracting &amp; matching the feature of voice signal. Mel-frequency Cepstrum Co-efficient (MFCC) is the feature extraction technique in which we will get ...

Text Independent Speaker Verification Using GMM

This paper presents the performance of a text independent speaker verification system using Gaussian Mixture Model (GMM) for the Brazilian Portuguese. The Gaussian compo-nents of the GMM statistically represent the spectral char-acteristics of the speaker, leading to an effective speaker recognition system. The main goal here is a detailed evalua-tion of the parameters used by the GMM such as the number of Gaussian mixtures, the amount of time for training and testing. Aiming at the definition of the best set of features for a reasonable response, this work helps the comprehen-sion of the model and gives insights for further investiga-tion. We have used 36 speakers in the experiments, all mod-eled with 15 mel-cepstral coefficients. For 32 Gaussians, 60 seconds of training, and 30 seconds of testing, the sys-tem has no failure for a reasonably clean speech signal. The results have shown that the higher the amount of time for training and testing, the better are the results for a give...

Presenting a New Text-Independent Speaker Verification System Based on Multi Model GMM

Journal of Advances in Computer Research, 2014

Speaker verification is the process of accepting or rejecting claimed identity in terms of its sound features. A speaker verification system can be used for numerous security systems, including bank account accessing, getting to security points, criminology and etc. When a speaker verification system wants to check the identity of individuals remotely, it confronts problems such as noise effect on speech signal and also identity falsification with speech synthesis. In this system, we have proposed a new speaker verification system based on Multi Model GMM, called SV-MMGMM, in which all speakers are divided into seven different age groups, and then an isolated GMM model for each group is created; instead of one model for all speakers. In order to evaluate, the proposed method has been compared with several speaker verification systems based on Naïve, SVM, Random Forest, Ensemble and basic GMM. Experimental results show that the proposed method has so better efficiency than others.

A NOVEL METHODOLOGY FOR SPEAKER VERIFICATION USING GMM

Internation journal of science and innovation engineering and technology, 2018

The speaker verification is a process of verifying the identity of the claimants. It performs one to one comparison between a newly input voice print and the voice print for the claimed identity that is stored in the database. In this paper, linear predictive coding coefficient has been used for formant detection. The peak frequencies in the frequency response of vocal tract are formants, which is being detected and compared for verification. Data base of twenty persons having five samples per person including male and female has been created for analysis of results. The System (Speaker verification) is usually employed as a "gatekeeper" in order to provide access to a secure system. These systems operate with the user's knowledge and typically require the user's cooperation. The developed system uses the MATLAB.

A Tutorial on Text-Independent Speaker Verification

Eurasip Journal on Advances in Signal Processing, 2004

This paper presents an overview of a state-of-the-art text-independent speaker verification system. First, an introduction proposes a modular scheme of the training and test phases of a speaker verification system. Then, the most commonly speech parameterization used in speaker verification, namely, cepstral analysis, is detailed. Gaussian mixture modeling, which is the speaker modeling technique used in most systems, is then explained. A few speaker modeling alternatives, namely, neural networks and support vector machines, are mentioned. Normalization of scores is then explained, as this is a very important step to deal with real-world data. The evaluation of a speaker verification system is then detailed, and the detection error trade-off (DET) curve is explained. Several extensions of speaker verification are then enumerated, including speaker tracking and segmentation by speakers. Then, some applications of speaker verification are proposed, including on-site applications, remote applications, applications relative to structuring audio information, and games. Issues concerning the forensic area are then recalled, as we believe it is very important to inform people about the actual performance and limitations of speaker verification systems. This paper concludes by giving a few research trends in speaker verification for the next couple of years.

Mel frequency Cepstrum Coefficients and Enhanced LBG algorithm for Speaker Recognition

In this paper, an improved strategy for automated text dependent speaker recognition system has been proposed in noisy environment. The preprocessing of speaker signal started with eliminate the background noise. The next step is signal filtering and features extraction using cepstrum coefficients method, this extracted features can be used to by the enhanced LBG for vector quantization algorithm for speaker recognition, such that the specified speaker can be determined by matching the speaker to be tested with in stored codebook in database. And finally select correct speaker that have the lesser Euclidean distance. The speech feature extraction was based on a dataset of 175 different samples collected from 25 different speakers The results of the proposed system approved with good recognition ratio of speaker identification with maximum accuracy about 96.2% for database with close set of selected words contains the most used phonemes. Also the results of experiments show that recognition accuracy increased with frames overlapping. [Hussein Lafta Attiya, Ali Yakoob Yousif. Mel frequency Cepstrum Coefficients and Enhanced LBG algorithm for Speaker Recognition. Researcher 2015;7(1):19-25]. (ISSN: 1553-9865). http://www.sciencepub.net/researcher. 4

Use of Mel Frequency Cepstral Coefficients for the Implementation of a Speaker Recognition System

2019

The paper proposes a Speaker Recognition system which does the task of validating a user’s claimed identity using characteristics extracted from their voices. It is one of the most useful and popular biometric recognition techniques in the world especially related to areas in which security is a major concern. A direct analysis and synthesizing of the complex voice signal is due to too much information contained in the signal. Therefore, the digital signal processes, Feature Extraction and Feature Matching were introduced to represent the voice signal. MelFrequency Cepstral Coefficients (MFCC) were extracted from the speech signal which were used to represent each speaker and recognition was carried out using weighted Euclidean distance. MATLABR2017b platform was used to implement feature extraction process. Index Terms – Co Feature matching, Feature Extraction, MFCC, Euclidean distance.

Individual Identification Through Voice Using Mel-Frequency Cepstrum Coefficient (MFCC) and Hidden Markov Models (HMM) Method

Journal of Measurements, Electronics, Communications, and Systems, 2020

Voice is one of the parameters in the identification process of a person. Through the voice, information will be obtained such as gender, age, and even the identity of the speaker. Speaker recognition is a method to narrow down crimes and frauds committed by voice. So that it will minimize the occurrence of faking one's identity. The Method of Mel Frequency Cepstrum Coefficient (MFCC) can be used in the speech recognition system. The process of feature extraction of speech signal using MFCC will produce acoustic speech signal. The classification, Hidden Markov Models (HMM) is used to match unidentified speaker’s voice with the voices in database. In this research, the system is used to verify the speaker, namely 15 text dependent in Indonesian. On testing the speaker with the same as database, the highest accuracy is 99,16%.

Text-Independent Speaker Identification Using GMM With Universal Background Model

2015

State-of-the-art of speaker recognition is fully advanced nowadays. There are various well-known technologies used to process voice, including Gaussian mixture models. The paper presents our work on speaker identification from his voice. In our experiment we first extract key features from a speech signal using VOICEBOX [1]toolbox in MATLAB. These features are represented by a matrix of mel frequency cepstral coefficients (MFCC). Then, applying MSR Identity Toolbox, we build an identity for each person enrolled in our system using statistical Gaussian Mixture Model Universal Background Model (GMM-UBM) and features extracted from speech signals. Universal Background Model improves Gaussian Mixture Model statistical computation for decision logic in speaker verification task. As a corpus, we used TIMIT database for our experiments. Finally, we compared the recognition accuracy for several different scenarios of our experiments.

Speaker Identification using Mel Frequency Cepstral Coefficient and BPNN

Speech processing is emerged as one of the important application area of digital signal processing. Various fields for research in speech processing are speech recognition, speaker recognition, speech synthesis, speech coding etc. The objective of automatic speaker recognition is to extract, characterize and recognize the information about speaker identity. Feature extraction is the first step for speaker recognition. Many algorithms are suggested/developed by the researchers for feature extraction. In this work, the Mel Frequency Cepstrum Coefficient (MFCC) feature has been used for designing a text dependent speaker identification system. BPNN is used for identification of speaker after training the feature set from MFCC. Some modifications to the existing technique of MFCC for feature extraction are also suggested to improve the speaker recognition efficiency. Information from speech recognition can be used in various ways in state-of-the-art speaker recognition systems. This includes the obvious use of recognized words to enable the use of text-dependent speaker modeling techniques when the words spoken are not given. Furthermore, it has been shown that the choice of words and phones itself can be a useful indicator of speaker identity. Also, recognizer output enables higher-level features, in particular those related to prosodic properties of speech.