Speaker Recognition in the forensic field: the centrality of a multilevel linguistic analysis (original) (raw)
Related papers
Forensic Speaker Identity Verification (F-SIV) in Italy: First Evaluation Campaign Evalita-2009
evalita.it
We report here the results of a first timid attempt to promote an evaluation campaign on a Forensic Speaker Identity Verification task within Evalita 2009. Participants were prompted to test methods and models usually used in forensics on a common corpus collected simulating real forensic characteristics and situations. The Task presented a Training data set including known suspected voices to be compared with voices in two other data sets, namely a Closed-test set of 16 unknown voices and an Open-test set containing different voices to be segmented before the test or comparison. Results achieved by participants are here briefly reported.
Technical forensic speaker identification from a Bayesian linguist's perspective
2004
Important methodological aspects of Technical Forensic Speaker Identification are discussed and exemplified. The centrality of the Likelihood Ratio of Bayes' Theorem as the proper way of forensically evaluating speech evidence is emphasised, as well as the many different types of evidence that are of use in discriminating same-speaker from differentspeaker speech samples.
Forensic Speech and Audio Analysis Forensic Linguistics
2001
Although the development of state-of-the-art speaker recognition systems has shown considerable progress in the last decade, performance levels of these systems do not as yet seem to warrant large-scale introduction in anything other than relatively low-risk applications. Conditions typical of the forensic context such as differences in recording equipment and transmission channels, the presence of background noise and of variation due to differences in communicative context continue to pose a major challenge. Consequently, the impact of automatic speaker recognition technology on the forensic scene has been relatively modest and forensic speaker identification practice remains heavily dominated by the use of a wide variety of largely subjective procedures. While recent developments in the interpretation of the evidential value of forensic evidence clearly favour methods that make it possible for results to be expressed in terms of a likelihood ratio, unlike automatic procedures, tr...
The case for automatic higher-level features in forensic speaker recognition
2008
Approaches from standard automatic speaker recognition, which rely on cepstral features, suffer the problem of lack of interpretability for forensic applications. But the growing practice of using "higher-level" features in automatic systems offers promise in this regard. We provide an overview of automatic higher-level systems and discuss potential advantages, as well as issues, for their use in the forensic context.
Forensic Speech and Audio Analysis Forensic Linguistics 1998-2001
2001
Although the development of state-of-the-art speaker recognition systems has shown considerable progress in the last decade, performance levels of these systems do not as yet seem to warrant large-scale introduction in anything other than relatively low-risk applications. Conditions typical of the forensic context such as differences in recording equipment and transmission channels, the presence of background noise and of variation due to differences in communicative context continue to pose a major challenge. Consequently, the impact of automatic speaker recognition technology on the forensic scene has been relatively modest and forensic speaker identification practice remains heavily dominated by the use of a wide variety of largely subjective procedures. While recent developments in the interpretation of the evidential value of forensic evidence clearly favour methods that make it possible for results to be expressed in terms of a likelihood ratio, unlike automatic procedures, tr...
Interspeech 2008, 2008
This paper presents and describes Ahumada III, a speech database in Spanish collected from real forensic cases. In its current release, the database presents ¢ £ male speakers recorded using the systems and procedures followed by Spanish Guardia Civil police force. The paper also explores the usefulness of such a corpus for facing the important problem of database mismatch in speaker recognition, understood as the difference between the database used for tuning a speaker recognition system and the data which the system will handle in operational conditions. This problem is typical in forensics, where variability in speech conditions may be extreme and difficult to model. Therefore, this work also presents a study evaluating the impact of such problem, for which a corpus quoted as NIST4M (NIST MultiMic MisMatch) has been constructed from NIST SRE 2006 data. NIST4M presents microphone data both in the enrolled models and in the test segments, allowing the generation of trials in a variety of strongly mismatching conditions. Database mismatch is simulated by eliminating some microphone channels of interest from the background data, and computing scores with speech from such microphones in unknown testing conditions as usually happens in forensic speaker recognition. Finally, we show how the incorporation of Ahumada III as background data is useful to face database mismatch in real-world forensic conditions.
IEEE Signal Processing Magazine, 2009
Forensic Speaker Recognition T here has long been a desire to be able to identify a person on the basis of his or her voice. For many years, judges, lawyers, detectives, and law enforcement agencies have wanted to use forensic voice authentication to investigate a suspect or to confirm a judgment of guilt or innocence [3] [35]. Challenges, realities, and cautions regarding the use of speaker recognition applied to forensic-quality samples are presented. Identifying a voice using forensic-quality samples is generally a challenging task for automatic, semiautomatic, and humanbased methods. The speech samples being compared may be recorded in different situations; e.g., one sample could be a yelling over the telephone, whereas the other might be a whisper in an interview room. A speaker could be disguising his or her voice, ill, or under the influence of drugs, alcohol, or stress in one or more of the samples. The speech samples will most likely contain noise, may be very short, and may not contain enough relevant speech material for comparative purposes. Each of these variables, in addition to the known variability of speech in general, makes reliable discrimination of speakers a complicated and daunting task. Although the scientific basis of authentication of a person by using his or her voice has been questioned by researchers (e.g., by scientists in 1970 [4], British academic phoneticians in 1983 [5], and the French speech communication community from 1990 to today [6]), there is a perception among the
Observations on Forensic Speaker Identification
2015
To clarify some major points in current Forensic Speaker Identification practice using a list of "weaknesses in the science" as perceived by the legal profession (Hayne & Crockett, 1995: 2,3). MODELS OF INFORMATION CONTENT IN SPEECH Simple model: transmission of information in "Speech Chain": idea-> production (how acoustic disturbances are produced by speaker)-> acoustics (physical structure of transmitted speech wave)-> perception (how listener decodes acoustic structure to understand its information content.) Fact-Acoustic output of a speaker is uniquely determined by the speaker's anatomy. Fact-it is relatively easy, even for medium quality recordings, to extract and quantify acoustic parameters from recorded speech, and to use these to characterise the speaker as they are speaking on that particular occasion.