The NIST speaker recognition evaluation ± Overview, methodology, systems, results, perspective (original) (raw)

This paper, based on three presentations made in 1998 at the RLA2C Workshop in Avignon, discusses the evaluation of speaker recognition systems from several perspectives. A general discussion of the speaker recognition task and the challenges and issues involved in its evaluation is o€ered. The NIST evaluations in this area and speci®cally the 1998 evaluation, its objectives, protocols and test data, are described. The algorithms used by the systems that were developed for this evaluation are summarized, compared and contrasted. Overall performance results of this evaluation are presented by means of detection error trade-o€ (DET) curves. These show the performance trade-o€ of missed detections and false alarms for each system and the e€ects on performance of training condition, test segment duration, the speakers' sex and the match or mismatch of training and test handsets. Several factors that were found to have an impact on performance, including pitch frequency, handset type and noise, are discussed and DET curves showing their e€ects are presented. The paper concludes with some perspective on the history of this technology and where it may be going. Ó 2000 Elsevier Science B.V. All rights reserved. R esum e Cet article, bas e sur les trois expos es e€ectu es en 1998 lors de la conf erence RLA2C a Avignon, pr esente les m ethodes de m etrologie de la reconnaissance du locuteur. Apres un aperc ß u de ce quÕest la reconnaissance du locuteur et des probl emes pos es, nous pr esenterons les objectifs, les m ethodes, les donn ees utilis ees et les r esultats de lÕ evaluation propos ee par NIST en 1998. Pour cela nous utiliserons des detection error trade-o€ (DET) curves. Ces courbes ont lÕavantage de montrer le compromis entre erreur et fausse alarme et lÕe€et sur les performances des conditions dÕen-trainement, de la dur ee des segments de tests, du sexe du locuteur ou du type de combin e. Ensuite nous pr esenterons, comparerons et ferons la synth ese des di€erents algorithmes utilis es par les syst emes propos es par les participants. Nous avons d ecouvert que plusieurs facteurs in¯uenc ß aient fortement les performances des syst emes, comme par exemple la tonalit e de la voix, le type de combin e utilis e ou le bruit ambiant. Nous les pr esenterons et mettrons en evidence a lÕaide de DET curves. Nous concluerons avec un historique des techniques de reconnaissance du locuteur et avec quelques projections de ce domaine dans lÕavenir: S 0 1 6 7-6 3 9 3 (9 9) 0 0 0 8 0-1