Results of the 2003 NFI-TNO Forensic Speaker Recognition Evaluation (original) (raw)
Related papers
NIST and NFI-TNO evaluations of automatic speaker recognition
Computer Speech & Language, 2006
In the past years, several text-independent speaker recognition evaluation campaigns have taken place. This paper reports on results of the NIST evaluation of 2004 and the NFI-TNO forensic speaker recognition evaluation held in 2003, and reflects on the history of the evaluation campaigns. The effects of speech duration, training handsets, transmission type, and gender mix show expected behaviour on the DET curves. New results on the influence of language show an interesting dependence of the DET curves on the accent of speakers. We also report on a number of statistical analysis techniques that have recently been introduced in the speaker recognition community, as well as a new application of the analysis of deviance analysis. These techniques are used to determine that the two evaluations held in 2003, by NIST and NFI-TNO, are of statistically different difficulty to the speaker recognition systems.
2003
In the present work we discuss the results, which our speaker verification system, WCL-1, obtained in the 2003 NFI/TNO Forensic Speaker Recognition Evaluation. These results, together with the ones obtained in the 2003 NIST Speaker Recognition Evaluation, give opportunity for in depth analysis of the various aspects of real-world application of the speaker recognition technology. Based on the detailed analysis of the speaker verification performance obtained in the different subtasks, we identify the virtues and disadvantages of the WCL-1 system and its potential areas of use.
Performance Evaluation of Automatic Speaker Recognition Techniques for Forensic Applications
New Trends and Developments in Biometrics, 2012
Performance is one of the fundamental aspects of an FASR (Forensic Automatic Speaker Recognition) system. It depends strongly on the variability in the speech signal, noise and distortions in the communications channel. The recognition task faces multiple problems: unconstrained input speech, uncooperative speakers, and uncontrolled environmental parameters. The speech samples will most likely contain noise, may be very short, and may not contain enough relevant speech material for comparative purposes. In automatic or semi-automatic speaker recognition, background noise is one of the main causes of alteration of the acoustic indexes used in the biometric recognition phase [3, 4]. Each of these variables makes reliable discrimination of speakers a complicated and daunting task. Typically the performance of a biometric system is determined by the errors generated by the recognition. There are two types of errors that can occur during a verification task: (a) false acceptance when the system accepts an imposter speaker; and (b) false rejection when the system rejects a valid speaker. Both types of errors are a function of the decision threshold. Choosing a high threshold of acceptance will result in a secure system that will accept only a few trusted speakers, however, at the expense of high false rejection rate (FRR) or False Non Match Rate (FNMR). Similarly choosing a low threshold would make the system more user friendly by reducing false rejection rate but at the expense of high false acceptance rate (FAR) or False Match Rate (FMR). This trade-off is typically depicted using a decision-error trade-off (DET) curve. The FAR and FRR of a verification system define different operating points on the DET curve.
Null-Hypothesis LLR: A Proposal for Forensic Automatic Speaker Recognition
Interspeech 2017
A new method named Null-Hypothesis LLR (H 0 LLR) is proposed for forensic automatic speaker recognition. The method takes into account the fact that forensically realistic data are difficult to collect and that inter-individual variation is generally better represented than intra-individual variation. According to the proposal, intra-individual variation is modeled as a projection from case-customized inter-individual variation. Calibrated log Likelihood Ratios (LLR) that are calculated on the basis of the H 0 LLR method were tested on two corpora of forensically-founded telephone interception test sets, German-based GFS 2.0 and Dutch-based NFI-FRITS. Five automatic speaker recognition systems were tested based on the scores or the LLRs provided by these systems which form the input to H 0 LLR. Speakerdiscrimination and calibration performance of H 0 LLR is comparable to the performance indices of the system-internal LLR calculation methods. This shows that external data and strategies that work with data outside the forensic domain and without case customization are not necessary. It is also shown that H 0 LLR leads to a reduction in the diversity of LLR output patterns of different automatic systems. This is important for the credibility of the Likelihood Ratio framework in forensics, and its application in forensic automatic speaker recognition in particular.
Forensic and Automatic Speaker Recognition System
Current Automatic Speaker Recognition (ASR) System has emerged as an important medium of confirmation of identity in many businesses, e-commerce applications, forensics and law enforcement as well. Specialists trained in criminological recognition can play out this undertaking far superior by looking at an arrangement of acoustic, prosodic, and semantic attributes which has been referred to as structured listening. An algorithm-based system has been developed in the recognition of forensic speakers by physics scientists and forensic linguists to reduce the probability of a contextual bias or pre-centric understanding of a reference model with the validity of an unknown audio sample and any suspicious individual. Many researchers are continuing to develop automatic algorithms in signal processing and machine learning so that improving performance can effectively introduce the speaker‟s identity, where the automatic system performs equally with the human audience. In this paper, I examine the literature about the identification of speakers by machines and humans, emphasizing the key technical speaker pattern emerging for the automatic technology in the last decade. I focus on many aspects of automatic speaker recognition (ASR) systems, including speaker-specific features, speaker models, standard assessment data sets, and performance metrics.
Technical forensic speaker recognition: Evaluation, types and testing of evidence
Computer Speech & Language, 2006
Important aspects of Technical Forensic Speaker Recognition, particularly those associated with evidence, are exemplified and critically discussed, and comparisons drawn with generic Speaker Recognition. The centrality of the Likelihood Ratio of BayesÕ theorem in correctly evaluating strength of forensic speech evidence is emphasised, as well as the many problems involved in its accurate estimation. It is pointed out that many different types of evidence are of use, both experimentally and forensically, in discriminating same-speaker from different-speaker speech samples, and some examples are given from real forensic case-work to illustrate the Likelihood Ratio-based approach. The extent to which Technical Forensic Speaker Recognition meets the Daubert requirement of testability is also discussed.
In this paper, the most employed forensic semi-automatic speaker recognition (FSASR) approaches in Italy-as emulated in the IMPAVIDO software, and some variants of them-are investigated from the point of view of the accuracy by means of score calibration and fusion with Logistic Regression (LogReg). The metric used to assess the accuracy was the likelihood-ratio cost (Cllr), while the factors of analysis include the database numerosity, different software implementation of LogReg, and the strategy to consider the features available for the different vowels. Our results show that the LogReg calibration has systematically improved the Cllr, suggesting that also the accuracy of the tested FSASR approaches may be improved introducing calibration. Furthermore, our findings suggest that the tested FSASR approaches give more accurate results when the IMPAVIDO database is used.
Tools for forensic speaker recognition
In press in: F. Orletti, L. Mariottini (Eds.), Forensic Communication. Theories, practice and instruments, Cambridge Scholar Publishing, Cambridge., 2017
In this work, we present and discuss a new software application—IMPAVIDO (Integrated Methods for PArametric Voice IDentificatiOn)— which aims to provide a development environment to test different techniques for FSR. Accordingly, both IDEM and SMART methodologies have been re-implemented (emulated, as best as our knowledge) in order to work together in the same environment. Here, we assess their performances on some operating conditions, focusing on accuracy metric estimation such as the Cllr and Tippet plots.
Speaker Recognition System and its Forensic Implications
2013
Speaker recognition comprises all those activities which attempt to link a speech sample to its speaker through its acoustic or perceptual properties [1]. Speech signal is a multidimensional acoustic wave (Figure 1), which provides information regarding speaker characteristics, spoken phrase, speaker emotions, additional noise, channel transformations etc [2,3]. The human voice is unique personal trait. For indistinguishable voice, the two individuals should have the identical vocal mechanism and identical coordination of their articulators, which is least probable. However, the some amount variations also occur in the speech exemplars obtained from the same speaker. This is due to the fact that a speaker cannot exactly imitate the same utterance again and again. Even, the signature of an individual also shows variation from trails to trials.