A Shortcut into Speaker Verification (original) (raw)
Iraqi Journal of Science
The theories and applications of speaker identification, recognition, and verification are among the well-established fields. Many publications and advances in the relevant products are still emerging. In this paper, research-related publications of the past 25 years (from 1996 to 2020) were studied and analysed. Our main focus was on speaker identification, speaker recognition, and speaker verification. The study was carried out using the Science Direct databases. Several references, such as review articles, research articles, encyclopaedia, book chapters, conference abstracts, and others, were categorized and investigated. Summary of these kinds of literature is presented in this paper, together with statistical analyses to represent the publications and their categories over the mentioned period. Important information, including the dataset used, the size of the data adopted, the implemented methods, and the accuracy of the obtained results in the analysed research, are extracted...
Speaker Verification and Identification
Intelligent Applications
A speaker recognition system verifies or identifies a speaker’s identity based on his/her voice. It is considered as one of the most convenient biometric characteristic for human machine communication. This chapter introduces several speaker recognition systems and examines their performances under various conditions. Speaker recognition can be classified into either speaker verification or speaker identification. Speaker verification aims to verify whether an input speech corresponds to a claimed identity, and speaker identification aims to identify an input speech by selecting one model from a set of enrolled speaker models. Both the speaker verification and identification system consist of three essential elements: feature extraction, speaker modeling, and matching. The feature extraction pertains to extracting essential features from an input speech for speaker recognition. The speaker modeling pertains to probabilistically modeling the feature of the enrolled speakers. The matc...
A Tutorial on Text-Independent Speaker Verification
Eurasip Journal on Advances in Signal Processing, 2004
This paper presents an overview of a state-of-the-art text-independent speaker verification system. First, an introduction proposes a modular scheme of the training and test phases of a speaker verification system. Then, the most commonly speech parameterization used in speaker verification, namely, cepstral analysis, is detailed. Gaussian mixture modeling, which is the speaker modeling technique used in most systems, is then explained. A few speaker modeling alternatives, namely, neural networks and support vector machines, are mentioned. Normalization of scores is then explained, as this is a very important step to deal with real-world data. The evaluation of a speaker verification system is then detailed, and the detection error trade-off (DET) curve is explained. Several extensions of speaker verification are then enumerated, including speaker tracking and segmentation by speakers. Then, some applications of speaker verification are proposed, including on-site applications, remote applications, applications relative to structuring audio information, and games. Issues concerning the forensic area are then recalled, as we believe it is very important to inform people about the actual performance and limitations of speaker verification systems. This paper concludes by giving a few research trends in speaker verification for the next couple of years.
Discriminative training of minimum cost speaker verification systems
1998
Ce papier présente une nouvelle méthode d'apprentissage pour les systèmes de vérification du locuteur. Cette méthode améliore les travaux précédents dans le domaine de vérification du locuteur en (1) développant un nouvel algorithme d'apprentissage discriminant a posteriori, et en (2)étendant l'algorithme pour optimiser directement les performances de la vérification du locuteur. L'élément clé de ce nouvel algorithme d'apprentissage améliorant l'état de l'art de la technologie initialise le système avec un modèle mélangé de Gauss modifié par des Bayesiens. L'algorithme d'apprentissage discriminant ajuste alors les paramètres de ces modèles pour directement minimiser une fonction du coût de la vérification (VCF) représentant le coût attendu des fausses acceptations des imposteurs et des faux rejets des locuteurs acceptables. Les résultats présentés proviennent du corpus de l'évaluation de la reconnaissance du locuteur du NIST en 1997 indiquant que la performance de la VCF peutêtre améliorée mais au depend d'une ré2duction de performance d'autres parties du système (différents coûts des fausses alarmes et des faux rejets).
Study of Speaker Verification Methods
Speaker verification is a process to accept or reject the identity claim of a speaker by comparing a set of measurements of the speaker‘s utterances with a reference set of measurements of the utterance of the person whose identity is claimed.. In speaker verification, a person makes an identity claim. There are two main stages in this technique, feature extraction and feature matching. Feature extraction is the process in which we extract some useful data which can later to be used to represent the speaker. Feature matching involves identification of the unknown speaker by comparing the feature extracted from the voice with the enrolled voices of known speakers.
A Study of Interspeaker Variability in Speaker Verification
IEEE Transactions on Audio, Speech, and Language Processing, 2000
We propose a new approach to the problem of estimating the hyperparameters which define the inter-speaker variability model in joint factor analysis. We tested the proposed estimation technique on the NIST 2006 speaker recognition evaluation data and obtained 10-15% reductions in error rates on the core condition and the extended data condition (as measured both by equal error rates and the NIST detection cost function). We show that when a large joint factor analysis model is trained in this way and tested on the core condition, the extended data condition and the cross-channel condition, it is capable of performing at least as well as fusions of multiple systems of other types. (The comparisons are based on the best results on these tasks that have been reported in the literature.) In the case of the cross-channel condition, a factor analysis model with 300 speaker factors and 200 channel factors can achieve equal error rates of less than 3.0%. This is a substantial improvement over the best results that have previously been reported on this task.
An Evaluation of "Commercial Off-The-Shelf" Speaker Verification Systems
2006 IEEE Odyssey - The Speaker and Language Recognition Workshop, 2006
An evaluation of commercial off-the-shelf speaker verification systems is reported. The performance of several systems, which were offered for testing, is analyzed against criteria designed to identify strengths and weaknesses that would determine their suitability for the use by government service agencies. Results for three text-dependent systems by Nuance, Persay and Scansoft are presented in this paper.
Speaker verification is the method of automatically identifying who is speaking on the basis of individual information integrated in speech waves. An important application of speaker verification is for forensic purposes. Speaker verification has seen an appealing research field for the last decades which still yields a number of unsolved problems. Many algorithms have been developed to accomplish, some of which include Gaussian Mixture Model (GMM), Hidden Markov Model (HMM), Artificial Neural Network. All the before mentioned algorithms serve the feature matching mechanism while the MFCC (Mel Frequency Cepstral Coefficients) are the features extracted of a voice signal. The Mel scale is mainly based on the study of observing the pitch or frequency perceived by the human.The simplest of the algorithms is calculating the distortion distance between the various codebooks of the speakers, but its efficiency is less compared to other algorithms. Here, we have tried to increase the efficiency of this method. The two phases of this system is the training phase and the testing phase. The training phase involves the feature extraction using MFCC and storing the codebooks in the database. The testing phase involves all these plus the distortion distance calculation using the codebook of the unknown speaker against all the speakers whose codebook is already stored in the database and is verified if the speaker matches with the claimed identity.
2000
guidance, help and support given throughout the course of the project. I would also like to express my greatest gratitude to Dr. T. Stathaki for her help, guid-ance and support throughout this year at Imperial College. Most of all, I would like to thank my family for their great support during this year at Imperial College and throughout my five years of undergraduate study at the Aristotle University of Thessaloniki, Greece. Biometric identification methods are trying to replace traditional identification meth-ods such as PIN numbers, identity cards etc. One of the common biometrics that can be used to identify a person's identity is speech. Speaker verification systems accept or reject the identity claim of a speaker by comparing a set of measurements of his speech with a reference set of measurements of the speech of the person whose identity is claimed. Many speaker verification systems were proposed and developed in the last decade with good performance. The basic aim of t...
Limited Data Speaker Verification: Fusion of Features
International Journal of Electrical and Computer Engineering (IJECE), 2017
The present work demonstrates experimental evaluation of speaker verification for different speech feature extraction techniques with the constraints of limited data (less than 15 seconds). The state-of-the-art speaker verification techniques provide good performance for sufficient data (greater than 1 minutes). It is a challenging task to develop techniques which perform well for speaker verification under limited data condition. In this work different features like Mel Frequency Cepstral Coefficients (MFCC), Linear Prediction Cepstral Coefficients (LPCC), Delta (), Delta-Delta (), Linear Prediction Residual (LPR) and Linear Prediction Residual Phase (LPRP) are considered. The performance of individual features is studied and for better verification performance , combination of these features is attempted. A comparative study is made between Gaussian mixture model (GMM) and GMM-universal background model (GMM-UBM) through experimental evaluation. The experiments are conducted using NIST-2003 database. The experimental results show that, the combination of features provides better performance compared to the individual features. Further GMM-UBM modeling gives reduced equal error rate (EER) as compared to GMM. 1. INTRODUCTION Speech signals play a main role in communication media to understand the conversation between the people [1]. The speaker recognition is a technique to recognize a speaker using his/her original speech voice and can be used for either speaker verification or speaker identification [2]. Over the last decade, speaker verification has been used for many commercial applications and these applications prefer limited data conditions. Further, limited data indicates speech data of few seconds (less than 15 sec). Based on the nature of training and test speech data, text-dependent and text-independent [3] are two classification of speaker verification. In text-dependent mode, speaker training and testing data remains same and in case of text-independent, training and testing speech data are different. Text-independent speaker verification under limited data conditions has always been a challenging task. The speaker verification system contains four stages, namely analysis of speech data, extraction of features, modeling and testing [4]. The analysis stage analyzes the speaker information using vocal tract [5], excitation source [6] and suprasegmental features like duration, accent and modulation [7].