ScienceDirect Comparison of Text Independent Speaker Identification Systems using GMM and i-Vector Methods (original) (raw)
Related papers
Speaker Recognition Using Gaussian Mixtures Models
Bio-Inspired Applications of Connectionism
Speaker recognition is a term which is most popular in biometric recognition technique that tends to identify and verify a speaker from his/her speech data. Speaker recognition system uses mechanism to recognize the speaker by using the speaker's speech signal. It is mainly useful in applications where security is the main and important one. Generally, speech information are recorded though the air microphone and these speech information collected from various speakers are used as input for the speaker recognition system as they are prone to environmental background noise, the performance is enhanced by integrating an additional speech signal collected through a throat microphone along with speech signal collected from standard air microphone. The resulting signal is very similar to normal speech, and is not affected by environmental background noise. This paper is mainly focused on extraction of the Mel frequency Cepstral Coefficients (MFCC) feature from an air speech signal and throat speech signal to built Gaussian Mixture Model(GMM) based closed-set text independent speaker recognition systems and to depict the result based on identification.
Enhancing the Performance of Gaussian Mixture Model-Based Text Independent Speaker Identification
Genetic Resources and Crop Evolution, 2005
In this paper, we seek to enhance the identification performance of Gaussian Mixture Model (GMM)based speaker identification systems in the presence of a limited amount of training data and a relatively large number of speakers. The performance is characterized by the identification accuracy, the identification time, and the model complexity. A new model order selection technique based on the Goodness of Fit (GOF) statistical test is proposed in order to increase the identification accuracy. This technique has shown to outperform other well known model order selection techniques like the Minimum Description Length (MDL) and the Akaike Information Criterion (AIC) in terms of the identification accuracy and the robustness against telephone channel degradation effects. In addition, the identification time is decreased by adapting the Linear Discriminative Analysis (LDA) feature extraction technique to fit our basic assumption of asymmetric multimodal distribution of the training data of each speaker. This modification results in a large decrease in the identification time with a little effect on the identification accuracy.
Text independent Speaker Identification using Gaussian mixture model
2007
This paper describes text-independent (TI) Speaker Identification (ID) using Gaussian mixture models (GMM). The use of GMM approach is motivated by that the individual Gaussian components of a GMM are shown to represent some general speaker-dependent spectral shapes that are effective for speaker identity modeling. For speaker model training, a fast re-estimation algorithm based on highest likelihood mixture clustering is introduced. In this work, the GMM is evaluated on TI Speaker ID task via series of experiments (model convergence, effect of feature set, number of Gaussian components, and training utterance length on identification rate). The database consisted of Malay clean sentence speech database uttered by 10 speakers (3 female and 7 male). Each speaker provides the same 40 sentences utterances (average length- 3.5s) with different text. The sentences for testing were different from those for training. The GMM achieved 98.4% identification rate using 5 training sentences. The model training based on highest likelihood clustering is shown to perform comparably to conventional expectation-maximization training but consumes much shorter computational time.
TEXT-INDEPENDENT SPEAKER VERIFICATION: A COMPARATIVE ANALYSIS STUDY
Acta Technica Napocensis, 2010
Gaussian mixture models (GMMs) remain the state of the art technique for modeling spectral envelope features for speech recognition systems. This paper presents a comparative analysis of the performance of three estimation algorithms Expectation Maximization (EM), Greedy EM Algorithm (GEM) and Figueiredo-Jain Algorithm (FJ) based Gaussian mixture models (GMMs) for text-independent speech biometrics verification. The simulation results are showed significant performance achievements. The test performance of, EER=0.26 % for "EM", EER=0.21 % for "GEM" and EER=0.16 % for "FJ", show that the behavioral information scheme of speech biometrics is more robust and have a discriminating power, which can be explored for identity authentication.
Text Independent Speaker Verification Using GMM
This paper presents the performance of a text independent speaker verification system using Gaussian Mixture Model (GMM) for the Brazilian Portuguese. The Gaussian compo-nents of the GMM statistically represent the spectral char-acteristics of the speaker, leading to an effective speaker recognition system. The main goal here is a detailed evalua-tion of the parameters used by the GMM such as the number of Gaussian mixtures, the amount of time for training and testing. Aiming at the definition of the best set of features for a reasonable response, this work helps the comprehen-sion of the model and gives insights for further investiga-tion. We have used 36 speakers in the experiments, all mod-eled with 15 mel-cepstral coefficients. For 32 Gaussians, 60 seconds of training, and 30 seconds of testing, the sys-tem has no failure for a reasonably clean speech signal. The results have shown that the higher the amount of time for training and testing, the better are the results for a give...
Improving Text-Independent Speaker Identification Performance Using Gaussian Mixture Speaker Models
2009
Systems that automatically recognize a speaker are increasingly important in humancomputer interaction because speech communication has always been and will continue to be the dominant mode of human social bonding and information exchange. This paper investigates the use of Gaussian mixture models (GMMs) for robust text-independent speaker identification. The experiments performed in this research examine several aspects and parameters of GMM usage: algorithmic issues, amount of training data, modeling different languages, and small and large population performance. We found that increasing the amount of training data and decreasing the number of speakers improved the accuracy of text-independent speaker identification using statistical models based on Gaussian mixture models. There also appears to be a maximum number of Gaussian mixture components needed to adequately model speakers and achieve good identification performance for different amounts of training data.
Improved Performance Based Method for Text Independent SpeakerIdentification
2014
Speaker identification is the computing task of recognizing speaker's identity based on their voices. The classification of speech depends on the extraction of several key features like Mel Frequency Cepstral Coefficients (MFCC) from the speech signals of speaker.A unique identity for each person who has enrolled for speaker identification can be built using a statistical model like Gaussian Mixture Model (GMM) and features extracted from the speech signals. Using Vector Quantization (VQ) technique, a decision function is proposed to decrease the training model for GMM in order to reduce the processing time. In the proposed modeling, the superiority of VQ is takento differentiate the male and female speaker .Then, GMM is applied into the subgroup of speaker to get the accuracy rates. Keywords—GMM, MFCC, VQ.
Text-independent Speaker Verification
Guide to Biometric Reference Systems and Performance Evaluation, 2009
Gaussian mixture models (GMMs) remain the state of the art technique for modeling spectral envelope features for speech recognition systems. This paper presents a comparative analysis of the performance of three estimation algorithms Expectation Maximization (EM), Greedy EM Algorithm (GEM) and Figueiredo-Jain Algorithm (FJ) based Gaussian mixture models (GMMs) for text-independent speech biometrics verification. The simulation results are showed significant performance achievements. The test performance of, EER=0.26 % for "EM", EER=0.21 % for "GEM" and EER=0.16 % for "FJ", show that the behavioral information scheme of speech biometrics is more robust and have a discriminating power, which can be explored for identity authentication.
GMM Versus AR-Vector Models for Text Independent Speaker Verification
2002
This paper presents a performance evaluation of two classification systems for text independent speaker verification: the Gaussian Mixture Model (GMM) and the AR-Vector Model. For the GMM, ¢ ¤ £ , ¥ § ¦ , and¨Gaussians are evaluated. On the other hand, an order £ model with the Itakura symmetric distance was used for the AR-Vector. Both classification systems presented no errors when training and testing times were not smaller than ¦ © s and ¢ © s, respectively. Using ¥ § © s as the test time, the most accurate classification systems errors were between © and ¢ ¢ %. With
Text-Independent Speaker Identification Using GMM With Universal Background Model
2015
State-of-the-art of speaker recognition is fully advanced nowadays. There are various well-known technologies used to process voice, including Gaussian mixture models. The paper presents our work on speaker identification from his voice. In our experiment we first extract key features from a speech signal using VOICEBOX [1]toolbox in MATLAB. These features are represented by a matrix of mel frequency cepstral coefficients (MFCC). Then, applying MSR Identity Toolbox, we build an identity for each person enrolled in our system using statistical Gaussian Mixture Model Universal Background Model (GMM-UBM) and features extracted from speech signals. Universal Background Model improves Gaussian Mixture Model statistical computation for decision logic in speaker verification task. As a corpus, we used TIMIT database for our experiments. Finally, we compared the recognition accuracy for several different scenarios of our experiments.