The Robustness of GMM-SVM in Real World Applied to Speaker Verification (original) (raw)
Related papers
Support vector GMMs for speaker verification
Speaker and Language Recognition …, 2006
This article presents a new approach using the discrimination power of Support Vectors Machines (SVM) in combination with Gaussian Mixture Models (GMM) for Automatic Speaker Verification (ASV). In this combination SVMs are applied in the GMM model space. Each point of this space represents a GMM speaker model. The kernel which is used for the SVM allows the computation of a similarity between GMM models. It was calculated using the Kullback-Leibler (KL) divergence. The results of this new approach show a clear improvement compared to a simple GMM system on the NIST2005 Speaker Recognition Evaluation primary task.
A Gaussian Mixture Model Training Optimization Method for Text Independent Speaker Verification
Training of high order Gaussian mixture models (GMM) on large dataset in one stage requires considerable amount of processing power and storage requirement which may not be either feasible or available in many cases. While training of such GMMs in several stages reduces the computational and memory costs; this normally results in a sub-optimum GMM compared to the one which entirely is trained in a single stage. In this paper a new method for optimization of the multi-stage trained GMMs is proposed in the context of speaker verification framework. Experimental results show that the optimized GMMs trained by incorporation of the proposed algorithm improves the performance of the GMM based speaker verification system.
Speaker verification using adapted Gaussian mixture models
Digital signal processing, 2000
In this paper we describe the major elements of MIT Lincoln Laboratory's Gaussian mixture model (GMM)-based speaker verification system used successfully in several NIST Speaker Recognition Evaluations (SREs). The system is built around the likelihood ratio test for verification, using simple but effective GMMs for likelihood functions, a universal background model (UBM) for alternative speaker representation, and a form of Bayesian adaptation to derive speaker models from the UBM. The development and use of a handset detector and score normalization to greatly improve verification performance is also described and discussed. Finally, representative performance benchmarks and system behavior experiments on NIST SRE corpora are presented.
Text Independent Speaker Verification Using GMM
This paper presents the performance of a text independent speaker verification system using Gaussian Mixture Model (GMM) for the Brazilian Portuguese. The Gaussian compo-nents of the GMM statistically represent the spectral char-acteristics of the speaker, leading to an effective speaker recognition system. The main goal here is a detailed evalua-tion of the parameters used by the GMM such as the number of Gaussian mixtures, the amount of time for training and testing. Aiming at the definition of the best set of features for a reasonable response, this work helps the comprehen-sion of the model and gives insights for further investiga-tion. We have used 36 speakers in the experiments, all mod-eled with 15 mel-cepstral coefficients. For 32 Gaussians, 60 seconds of training, and 30 seconds of testing, the sys-tem has no failure for a reasonably clean speech signal. The results have shown that the higher the amount of time for training and testing, the better are the results for a give...
Speaker Recognition Using Gaussian Mixtures Models
Bio-Inspired Applications of Connectionism
Speaker recognition is a term which is most popular in biometric recognition technique that tends to identify and verify a speaker from his/her speech data. Speaker recognition system uses mechanism to recognize the speaker by using the speaker's speech signal. It is mainly useful in applications where security is the main and important one. Generally, speech information are recorded though the air microphone and these speech information collected from various speakers are used as input for the speaker recognition system as they are prone to environmental background noise, the performance is enhanced by integrating an additional speech signal collected through a throat microphone along with speech signal collected from standard air microphone. The resulting signal is very similar to normal speech, and is not affected by environmental background noise. This paper is mainly focused on extraction of the Mel frequency Cepstral Coefficients (MFCC) feature from an air speech signal and throat speech signal to built Gaussian Mixture Model(GMM) based closed-set text independent speaker recognition systems and to depict the result based on identification.
Speaker identification and verification using Gaussian mixture speaker models
Speech Communication, 1995
Gaussian Mixture Models (GMMs) have been successfully applied to the tasks of speaker ID and verification when a large amount of enrolment data is available to characterize client speakers ([1],[10], ). However, there are many applications where it is unreasonable to expect clients to spend this much time training the system. Thus, we have been exploring the performance of various methods when only a sparse amount of enrolment data is available. Under such conditions, the performance of GMMs deteriorates drastically. A possible solution is the "eigenvoice" approach, in which client and test speaker models are confined to a low-dimensional linear subspace obtained previously from a different set of training data. One advantage of the approach is that it does away with the need for impostor models for speaker verification.
and speaker verification applications. This paper presents a study of the model parameters effects in a state-of-the-art adapted GMM based text-independent speaker verification system. The system is benefited from likelihood ratio test for verification, using adapted GMMs for likelihood functions, a universal background model (UBM) for alternative speaker representation, and a form of Bayesian adaptation to derive speaker models from the UBM. Fast scoring and normalization of scores was used which is a very important criterion to deal with real-world data. The system performance was evaluated using the detection error trade-off (DET) curves and decision cost function (DCF). The effects of model order, the training and test speeches lengths were studied experimentally.
Gaussian mixture model: a modeling technique for speaker recognition and its component
This paper provides an overview of Gaussian Mixture Model (GMM) and its component of speech signal. During the earlier period it has been revealed that Gaussian Mixture Model is very much appropriate for voice modeling in speaker recognition system. For Speaker recognition, Gaussian mixture model is an essential appliance of statistical clustering. The task effortlessly performed by humans is not effortless for machine or computers such as voice recognition or face recognition so for this function speaker recognition technology makes available a solution, using this technology the computers/machines outperforms than humans.
Mehran University Research Journal of Engineering and Technology, 2013
This paper investigates the task of SR (Speaker Recognition) for the state-of-the-art techniques. The paper initially presents the technical description of automatic SR, followed by the comparative analysis of a number of methods available for feature extraction and modeling. Based on this analysis the NIST 2001, NIST 2002, NIST 2004 and NIST 2006 Speaker recognition corpora are used to investigate the state of the art feature extraction and modeling techniques. The state of the art technique for feature extraction is delta MFCC ( Mel Frequency Cepstral Coefficients) and for modeling is GMM (Gaussian Mixture Models) based on EM (Expectation Maximization). Further in this paper the details about the enrollment/training and recognition/testing is also presented. For different stages of SR systems the conventional methods are summarized.