Analysis of Methods and Techniques Used for Speaker Identification, Recognition, and Verification: A Study on Quarter-Century Research Outcomes (original) (raw)
Related papers
Speaker Verification and Identification
Intelligent Applications
A speaker recognition system verifies or identifies a speaker’s identity based on his/her voice. It is considered as one of the most convenient biometric characteristic for human machine communication. This chapter introduces several speaker recognition systems and examines their performances under various conditions. Speaker recognition can be classified into either speaker verification or speaker identification. Speaker verification aims to verify whether an input speech corresponds to a claimed identity, and speaker identification aims to identify an input speech by selecting one model from a set of enrolled speaker models. Both the speaker verification and identification system consist of three essential elements: feature extraction, speaker modeling, and matching. The feature extraction pertains to extracting essential features from an input speech for speaker recognition. The speaker modeling pertains to probabilistically modeling the feature of the enrolled speakers. The matc...
Speaker identification-A survey
- Speaker identification is the process of determining which speaker provides a given utterance of speech from a set of registered speakers. Speaker identification is a sub-process of speaker recognition in which an individual speaker is recognized based on the information included in the speech signal. Speaker identification requires two phases[3]. First phase is called the enrolment phase and the second phase is called the testing phase. In the enrolment phase, the speech signals from all registered speakers who are going to be identified are acquired. Then the speaker model is constructed for each and every speaker. In the testing phase, the speech signal of an unknown utterance is compared to each of the enrolled speaker model. This technique makes it possible to use the speaker’s voice to check and verify their identity and control access to services such as forensic applications, authentication, law enforcement, voice based attendance entry systems etc. In this paper, some of the speaker identification techniques are discussed and the performance provided by those techniques are analysed. Then, some suggestions are provided to improve the performance of those techniques.
A Survey of Speaker Recognition: Fundamental Theories, Recognition Methods and Opportunities
IEEE Access
Humans can identify a speaker by listening to their voice, over the telephone, or on any digital devices. Acquiring this congenital human competency, authentication technologies based on voice biometrics, such as automatic speaker recognition (ASR), have been introduced. An ASR recognizes speakers by analyzing speech signals and characteristics extracted from speaker's voices. ASR has recently become an effective research area as an essential aspect of voice biometrics. Specifically, this literature survey gives a concise introduction to ASR and provides an overview of the general architectures dealing with speaker recognition technologies, and upholds the past, present, and future research trends in this area. This paper briefly describes all the main aspects of ASR, such as speaker identification, verification, diarization etc. Further, the performance of current speaker recognition systems are investigated in this survey with the limitations and possible ways of improvement. Finally, a few unsolved challenges of speaker recognition are presented at the closure of this survey. INDEX TERMS Automatic speaker recognition, feature extraction, recognition techniques, performance measures, challenges. 2) TEXT-DEPENDENT AND TEXT-INDEPENDENT RECOGNITION Text-dependency is another level of classification of speaker recognition (SR). This classification is based upon the
A Shortcut into Speaker Verification
Digest of the proceedings of the WSEAS conferences, 2003
This paper is intended to introduce a graduate student to the speaker verification problem. Knowledge of pioneer work, as well as real methods overview, clears the path into deeper investigation. A list of useful references and books provides further orientation. The panorama is not complete if practical considerations are not taken into account: corpora, assessment of detection performance, speech and channel variability.
A Tutorial on Text-Independent Speaker Verification
Eurasip Journal on Advances in Signal Processing, 2004
This paper presents an overview of a state-of-the-art text-independent speaker verification system. First, an introduction proposes a modular scheme of the training and test phases of a speaker verification system. Then, the most commonly speech parameterization used in speaker verification, namely, cepstral analysis, is detailed. Gaussian mixture modeling, which is the speaker modeling technique used in most systems, is then explained. A few speaker modeling alternatives, namely, neural networks and support vector machines, are mentioned. Normalization of scores is then explained, as this is a very important step to deal with real-world data. The evaluation of a speaker verification system is then detailed, and the detection error trade-off (DET) curve is explained. Several extensions of speaker verification are then enumerated, including speaker tracking and segmentation by speakers. Then, some applications of speaker verification are proposed, including on-site applications, remote applications, applications relative to structuring audio information, and games. Issues concerning the forensic area are then recalled, as we believe it is very important to inform people about the actual performance and limitations of speaker verification systems. This paper concludes by giving a few research trends in speaker verification for the next couple of years.
Study of Speaker Verification Methods
Speaker verification is a process to accept or reject the identity claim of a speaker by comparing a set of measurements of the speaker‘s utterances with a reference set of measurements of the utterance of the person whose identity is claimed.. In speaker verification, a person makes an identity claim. There are two main stages in this technique, feature extraction and feature matching. Feature extraction is the process in which we extract some useful data which can later to be used to represent the speaker. Feature matching involves identification of the unknown speaker by comparing the feature extracted from the voice with the enrolled voices of known speakers.
Provisional chapter Speaker Recognition : Advancements and Challenges
2012
Speaker Recognition is a multi-disciplinary branch of biometrics that may be used for identification, verification, and classification of individual speakers, with the capability of tracking, detection, and segmentation by extension. Recently, a comprehensive book on all aspects of speaker recognition was published [1]. Therefore, here we are not concerned with details of the standard modeling which is and has been used for the recognition task. In contrast, we present a review of the most recent literature and briefly visit the latest techniques which are being deployed in the various branches of this technology.
Analysing the Performance of Speaker Verification Task using Different Features
International Journal of Computer Applications, 2013
Speaker recognition is the identification of the person who is speaking by characteristics of their voices, also called "voice recognition". The components of Speaker Recognition includes Speaker Identification(SI) and Speaker Verification(SV). Speaker identification is the task of determining an unknown speakers identity. If the speaker claims to be of a certain identity and the voice is to verify this claim, this is called Speaker Verification. It determines whether an unknown voice matches the known voice of a speaker whose identity is being claimed. This paper proposes Speaker Verification task. There are two phases in the Speaker Verification task namely, training and testing. In the training phase, different features such as Mel Frequency Cepstral Coefficient(MFCC), Linear Predictive Cepstral Coefficient(LPCC), Perceptual Linear Predictive(PLP) are extracted from the speech signal and is trained by Support Vector Machine to get the target speaker model. It is trained with both actual speaker and impostor utterances. In the testing phase, features are extracted from the test speech signal. The test features are extracted for different duration of time. The extracted feature vectors are given to the claimed speaker model and the decision is taken as authorised speaker or an impostor. The performance of a speaker verification task is analysed using different features with different utterance sizes. The result shows that the performance of a speaker verification task decreases when the duration of the speech utterances decreased.
The NIST speaker recognition evaluation – Overview, methodology, systems, results, perspective
Speech Communication, 2000
This paper, based on three presentations made in 1998 at the RLA2C Workshop in Avignon, discusses the evaluation of speaker recognition systems from several perspectives. A general discussion of the speaker recognition task and the challenges and issues involved in its evaluation is oered. The NIST evaluations in this area and speci®cally the 1998 evaluation, its objectives, protocols and test data, are described. The algorithms used by the systems that were developed for this evaluation are summarized, compared and contrasted. Overall performance results of this evaluation are presented by means of detection error trade-o (DET) curves. These show the performance trade-o of missed detections and false alarms for each system and the eects on performance of training condition, test segment duration, the speakers' sex and the match or mismatch of training and test handsets. Several factors that were found to have an impact on performance, including pitch frequency, handset type and noise, are discussed and DET curves showing their eects are presented. The paper concludes with some perspective on the history of this technology and where it may be going.