Towards Vulnerability Analysis of Voice-Driven Interfaces and Countermeasures for Replay Attacks (original) (raw)

Audio Replay Attack Detection Using High-Frequency Features

Interspeech 2017

This paper presents our contribution to the ASVspoof 2017 Challenge. It addresses a replay spoofing attack against a speaker recognition system by detecting that the analysed signal has passed through multiple analogue-to-digital (AD) conversions. Specifically, we show that most of the cues that enable to detect the replay attacks can be found in the high-frequency band of the replayed recordings. The described anti-spoofing countermeasures are based on (1) modelling the subband spectrum and (2) using the proposed features derived from the linear prediction (LP) analysis. The results of the investigated methods show a significant improvement in comparison to the baseline system of the ASVspoof 2017 Challenge. A relative equal error rate (EER) reduction by 70% was achieved for the development set and a reduction by 30% was obtained for the evaluation set.

Securing Voice-Driven Interfaces Against Fake (Cloned) Audio Attacks

2019 IEEE Conference on Multimedia Information Processing and Retrieval (MIPR), 2019

Voice cloning technologies have found applications in a variety of areas ranging from personalized speech interfaces to advertisement, robotics, and so on. Existing voice cloning systems are capable of learning speaker characteristics and use trained models to synthesize a person's voice from only a few audio samples. Advances in cloned speech generation technologies are capable of generating perceptually indistinguishable speech from a bona-fide speech. These advances pose new security and privacy threats to voice-driven interfaces and speech-based access control systems. The state-of-the-art speech synthesis technologies use trained or tuned generative models for cloned speech generation. Trained generative models rely on linear operations, learned weights, and excitation source for cloned speech synthesis. These systems leave characteristic artifacts in the synthesized speech. Higher-order spectral analysis is used to capture differentiating attributes between bona-fide and cloned audios. Specifically, quadrature phase coupling (QPC) in the estimated bicoherence, Gaussianity test statistics, and linearity test statistics are used to capture generative model artifacts. Performance of the proposed method is evaluated on cloned audios generated using speaker adaptation-and speaker encoding-based approaches. Experimental results for a dataset consisting of 126 cloned speech and 8 bona-fide speech samples indicate that the proposed method is capable of detecting bona-fide and cloned audios with close to a perfect detection rate.

Voice biometric system security: Design and analysis of countermeasures for replay attacks

2020

PhD ThesisVoice biometric systems use automatic speaker veri cation (ASV) technology for user authentication. Even if it is among the most convenient means of biometric authentication, the robustness and security of ASV in the face of spoo ng attacks (or presentation attacks) is of growing concern and is now well acknowledged by the research community. A spoo ng attack involves illegitimate access to personal data of a targeted user. Replay is among the simplest attacks to mount | yet di cult to detect reliably and is the focus of this thesis. This research focuses on the analysis and design of existing and novel countermeasures for replay attack detection in ASV, organised in two major parts. The rst part of the thesis investigates existing methods for spoo ng detection from several perspectives. I rst study the generalisability of hand-crafted features for replay detection that show promising results on synthetic speech detection. I nd, however, that it is di cult to achieve simil...

Voice Spoofing Countermeasure for Voice Replay Attacks using Deep Learning

In our everyday lives, we communicate with each other using several means and channels of communication, as communication is crucial in the lives of humans. Listening and speaking are the primary forms of communication. For listening and speaking, the human voice is indispensable. Voice communication is the simplest type of communication. The Automatic Speaker Verification (ASV) system verifies users with their voices. These systems are susceptible to voice spoofing attacks - logical and physical access attacks. Recently, there has been a notable development in the detection of these attacks. Attackers use enhanced gadgets to record users' voices, replay it for the ASV system, and be granted access for harmful purposes. In this work, we propose a secure voice spoofing countermeasure for the purpose of detecting voice replay attacks. We enhanced the ASV system security by building a spoofing countermeasure dependent on the decomposed signals that consists of prominent information...

Audio Replay Attack Detection in Automated Speaker Verification

International Journal of Computer Applications, 2018

Automated Speaker Verification (ASV) systems are extensively used for authentication and verification measures. Countermeasures are developed for ASV systems to protect it from audio replay attacks. This paper describes the ASVspoof2017 database, conceptual analysis of various algorithms and their classification followed by prediction of results. Feature extraction is based on the recently introduced Constant Q Transform (CQT), a perceptually mapped frequency-time analysis tool mainly used with audio samples. The training dataset comprises of 1508 genuine samples and 1508 spoof samples. A training accuracy of 84.4% is achieved for variations of boosted decision tree. Parameters such as learning rate, number of learners and splits were empirically optimized. LogitBoost was found to have outperformed AdaBoost in all metrics. Furthermore, an implementation of a single hidden layer neural network achieved a training accuracy of 92.1%. A comparison of the algorithms revealed that while the neural network achieved a higher overall training accuracy, it had a lower True Negative Rate than LogitBoost. Overall, the paper describes a generalized system capable to detection of replay

Speech Demodulation-based Techniques for Replay and Presentation Attack Detection

2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), 2019

Spoofing is one of the threats that bypass the voice biometrics and gains the access to the system. In particular, Automatic Speaker Verification (ASV) system is vulnerable to various kinds of spoofing attacks. This paper is an extension of our earlier work, the combination of different speech demodulation techniques, such as Hilbert Transform (HT), Energy Separation Algorithm (ESA), and its Variable length version (VESA) is investigated for replay Spoof Speech Detection (SSD) task. In particular, the feature sets are developed using Instantaneous Amplitude and Instantaneous Frequency (IA-IF) components of narrowband filtered speech signals obtained from linearly-spaced Gabor filterbank. We observed relative effectiveness of these demodulation techniques on two spoof speech databases, i.e., BTAS 2016 and ASVspoof 2017 version 2.0 challenge database that focus on the presentation and replay attacks, respectively. The results obtained from different demodulation techniques gave compar...

An assessment of automatic speaker verification vulnerabilities to replay spoofing attacks

Security and Communication Networks, 2016

This paper analyses the threat of replay spoofing or presentation attacks in the context of automatic speaker verification. As relatively high-technology attacks, speech synthesis and voice conversion, which have thus far received far greater attention in the literature, are probably beyond the means of the average fraudster. The implementation of replay attacks, in contrast, requires no specific expertise nor sophisticated equipment. Replay attacks are thus likely to be the most prolific in practice, while their impact is relatively under-researched. The work presented here aims to compare at a high level the threat of replay attacks with those of speech synthesis and voice conversion. The comparison is performed using strictly controlled protocols and with six different automatic speaker verification systems including a state-of-the-art iVector/probabilistic linear discriminant analysis system. Experiments show that low-effort replay attacks present at least a comparable threat to speech synthesis and voice conversion. The paper also describes and assesses two replay attack countermeasures. A relatively new approach based on the local binary pattern analysis of speech spectrograms is shown to outperform a competing approach based on the detection of far-field recordings.

Re-assessing the threat of replay spoofing attacks against automatic speaker verification

This paper reexamines the threat of spoofing or presentation attacks in the context of automatic speaker verification (ASV). While voice conversion and speech synthesis attacks present as erious threat, and have accordingly receivedag reat deal of attention in the recent literature, theyc an only be implemented with ah igh level of technical know-how. In contrast, the implementation of replay attacks require no specific expertise nor anys ophisticated equipment and thus theya rguably present a greater risk. The comparative threat of each attack is reexamined in this paper against six different ASV systems including astate-of-the-art iVector-PLDAsystem. Despite the lack of attention in the literature, experiments showthat low-effort replay attacks provoke higher levels of false acceptance than comparatively higher-effort spoofing attacks such as voice conversion and speech synthesis. Results therefore showt he need to refocus research effort and to develop countermeasures against replay attacks in future work. * The work of A. Janicki wassupported by the European Union in the framework of the European Social Fund through the WarsawUniversity of Technology Development Programme.

Preventing replay attacks on speaker verification systems

2011 Carnahan Conference on Security Technology, 2011

In this paper, we describe a system for detecting spoofing attacks on speaker verification systems. We understand as spoofing the fact of impersonating a legitimate user. We focus on detecting two types of low technology spoofs. On the one side, we try to expose if the test segment is a far-field microphone recording of the victim that has been replayed on a telephone handset using a loudspeaker. On the other side, we want to determine if the recording has been created by cutting and pasting short recordings to forge the sentence requested by a text dependent system. This kind of attacks is of critical importance for security applications like access to bank accounts. To detect the first type of spoof we extract several acoustic features from the speech signal. Spoofs and non-spoof segments are classified using a support vector machine (SVM). The cut and paste is detected comparing the pitch and MFCC contours of the enrollment and test segments using dynamic time warping (DTW). We performed experiments using two databases created for this purpose. They include signals from land line and GSM telephone channels of 20 different speakers. We present results of the performance separately for each spoofing detection system and the fusion of both. We have achieved error rates under 10% for all the conditions evaluated. We show the degradation on the speaker verification performance in the presence of this kind of attack and how to use the spoofing detection to mitigate that degradation.

Overview of BTAS 2016 speaker anti-spoofing competition

2016 IEEE 8th International Conference on Biometrics Theory, Applications and Systems (BTAS)

This paper provides an overview of the Speaker Antispoofing Competition organized by Biometric group at Idiap Research Institute for the IEEE International Conference on Biometrics: Theory, Applications, and Systems (BTAS 2016). The competition used AVspoof database, which contains a comprehensive set of presentation attacks, including, (i) direct replay attacks when a genuine data is played back using a laptop and two phones (Samsung Galaxy S4 and iPhone 3G), (ii) synthesized speech replayed with a laptop, and (iii) speech created with a voice conversion algorithm, also replayed with a laptop. The paper states competition goals, describes the database and the evaluation protocol, discusses solutions for spoofing or presentation attack detection submitted by the participants, and presents the results of the evaluation.