Pr$\epsilon\epsilon$ch: A System for Privacy-Preserving Speech Transcription (original) (raw)
Related papers
Prϵϵch: A System for Privacy-Preserving Speech Transcription
2019
New Advances in machine learning and the abundance of speech datasets have made Automated Speech Recognition (ASR) systems, with very high accuracy, a reality. ASR systems offer their users the means to transcribe speech data at scale. Unfortunately, these systems pose serious privacy threats as speech is a rich source of sensitive acoustic and textual information. Although offline ASR eliminates the privacy risks, we find that its transcription performance is inferior to that of cloud-based ASR systems, especially for real-world recordings. In this paper, we propose Prϵϵch, an end-to-end speech transcription system which lies at an intermediate point in the privacy-utility spectrum of speech transcription. It protects the acoustic features of the speakers' voices and protects the privacy of the textual content at an improved performance relative to offline ASR. Prϵϵch relies on cloud-based services to transcribe a speech file after applying a series of privacy-preserving operat...
Towards Privacy-Preserving Speech Data Publishing
—Privacy-preserving data publishing has been a heated research topic in the last decade. Numerous ingenious attacks on users' privacy and defensive measures have been proposed for the sharing of various data, varying from relational data, social network data, spatiotemporal data, to images and videos. Speech data publishing, however, is still untouched in the literature. To fill this gap, we study the privacy risk in speech data publishing and explore the possibilities of performing data sanitization to achieve privacy protection while preserving data utility simultaneously. We formulate this optimization problem in a general fashion and present thorough quantifications of privacy and utility. We analyze the sophisticated impacts of possible sanitization methods on privacy and utility, and also design a novel method – key term perturbation for speech content sanitization. A heuristic algorithm is proposed to personalize the sanitization for speakers to restrict their privacy leak (p-leak limit) while minimizing the utility loss. The simulations of linkage attacks and sanitization on real datasets validate the necessity and feasibility of this work.
Private Speech Characterization with Secure Multiparty Computation
2020
Deep learning in audio signal processing, such as human voice audio signal classification, is a rich application area of machine learning. Legitimate use cases include voice authentication, gunfire detection, and emotion recognition. While there are clear advantages to automated human speech classification, application developers can gain knowledge beyond the professed scope from unprotected audio signal processing. In this paper we propose the first privacy-preserving solution for deep learning-based audio classification that is provably secure. Our approach, which is based on Secure Multiparty Computation, allows to classify a speech signal of one party (Alice) with a deep neural network of another party (Bob) without Bob ever seeing Alice's speech signal in an unencrypted manner. As threat models, we consider both passive security, i.e. with semi-honest parties who follow the instructions of the cryptographic protocols, as well as active security, i.e. with malicious parties ...
Preserving Privacy in Speaker and Speech Characterisation
Computer Speech & Language
Speech recordings are a rich source of personal, sensitive data that can be used to support a plethora of diverse applications, from health profiling to biometric recognition. It is therefore essential that speech recordings are adequately protected so that they cannot be misused. Such protection, in the form of privacy-preserving technologies, is required to ensure that: (i) the biometric profiles of a given individual (e.g., across different biometric service operators) are unlinkable; (ii) leaked, encrypted biometric information is irreversible, and that (iii) biometric references are renewable. Whereas many privacy-preserving technologies have been developed for other biometric characteristics, very few solutions have been proposed to protect privacy in the case of speech signals. Despite privacy preservation this is now being mandated by recent European and international data protection regulations. With the aim of fostering progress and collaboration between researchers in the speech, biometrics and applied cryptography communities, this survey article provides an introduction to the field, starting with a legal perspective on privacy preservation in the case of speech data. It then establishes the requirements for effective privacy preservation, reviews generic cryptography-based solutions, followed by specific techniques that are applicable to speaker characterisation (biometric applications) and speech characterisation (non-biometric applications). Glancing at non-biometrics, methods are presented to avoid function creep, preventing the exploitation of biometric information, e.g., to single out an identity in speech-assisted health care via I Recent advances in speaker and language recognition and characterisation.
Proceeding of the 16th ACM international conference on Multimedia - MM '08, 2008
Audio monitoring has many applications but also raises privacy concerns. In an attempt to help alleviate these concerns, we have developed a method for reducing the intelligibility of speech while preserving intonation and the ability to recognize most environmental sounds. The method is based on identifying vocalic regions and replacing the vocal tract transfer function of these regions with the transfer function from prerecorded vowels, where the identity of the replacement vowel is independent of the identity of the spoken syllable. The audio signal is then re-synthesized using the original pitch and energy, but with the modified vocal tract transfer function. We performed an intelligibility study which showed that environmental sounds remained recognizable but speech intelligibility can be dramatically reduced to a 7% word recognition rate.
Privacy-preserving Query-by-Example Speech Search
2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2015
This paper investigates a new privacy-preserving paradigm for the task of Query-by-Example Speech Search using Secure Binary Embeddings, a hashing method that converts vector data to bit strings through a combination of random projections followed by banded quantization. The proposed method allows performing spoken query search in an encrypted domain, by analyzing ciphered information computed from the original recordings. Unlike other hashing techniques, the embeddings allow the computation of the distance between vectors that are close enough, but are not perfect matches. This paper shows how these hashes can be combined with Dynamic Time Warping based on posterior derived features to perform secure speech search. Experiments performed on a sub-set of the Speech-Dat Portuguese corpus showed that the proposed privacy-preserving system obtains similar results to its non-private counterpart.
Design of Voice Privacy System using Linear Prediction
2020 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), 2020
Speaker’s identity is the most crucial information exploited (implicitly) by an Automatic Speaker Verification (ASV) system. Numerous attacks can be obliterated simultaneously if privacy preservation is exercised for a speaker’s identity. The baseline of the Voice Privacy Challenge 2020 by INTERSPEECH uses the Linear Prediction (LP) model of speech, and McAdam’s coefficient for achieving speaker de-identification. The baseline approach focuses on altering only the pole angles using McAdam’s coefficient. However, from speech acoustics and digital resonator design, the radius of the poles is associated with various energy losses. The energy losses implicitly carry speaker-specific information during speech production. To that effect, the authors have brought fine-tuned changes in both pole angle and pole radius, resulting in 18.98% higher value of EER for Vctk-test-com dataset, and 5% lower WER for Libri-test dataset compared to the baseline. This means privacy-preservation is indeed ...
Privacy preserving encrypted phonetic search of speech data
2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
This paper presents a strategy for enabling speech recognition to be performed in the cloud whilst preserving the privacy of users. The approach advocates a demarcation of responsibilities between the client and server-side components for performing the speech recognition task. On the client-side resides the acoustic model, which symbolically encodes the audio and encrypts the data before uploading to the server. The server-side then employs searchable encryption to enable the phonetic search of the speech content. Some preliminary results for speech encoding and searchable encryption are presented.
VoiceMask: Anonymize and Sanitize Voice Input on Mobile Devices
—Voice input has been tremendously improving the user experience of mobile devices by freeing our hands from typing on the small screen. Speech recognition is the key technology that powers voice input, and it is usually outsourced to the cloud for the best performance. However, the cloud might compromise users' privacy by identifying their identities by voice, learning their sensitive input content via speech recognition, and then profiling the mobile users based on the content. In this paper, we design an intermediate between users and the cloud, named VoiceMask, to sanitize users' voice data before sending it to the cloud for speech recognition. We analyze the potential privacy risks and aim to protect users' identities and sensitive input content from being disclosed to the cloud. VoiceMask adopts a carefully designed voice conversion mechanism that is resistant to several attacks. Meanwhile, it utilizes an evolution-based keyword substitution technique to sanitize the voice input content. The two sanitization phases are all performed in the resource-limited mobile device while still maintaining the usability and accuracy of the cloud-supported speech recognition service. We implement the voice sanitizer on Android systems and present extensive experimental results that validate the effectiveness and efficiency of our app. It is demonstrated that we are able to reduce the chance of a user's voice being identified from 50 people by 84% while keeping the drop of speech recognition accuracy within 14.2%.
Towards a Privacy Compliant Cloud Architecture for Natural Language Processing Platforms
Proceedings of the 21st International Conference on Enterprise Information Systems, 2019
Natural language processing in combination with advances in artificial intelligence is on the rise. However, compliance constraints while handling personal data in many types of documents hinder various application scenarios. We describe the challenges of working with personal and particularly sensitive data in practice with three different use cases. We present the anonymization bootstrap challenge in creating a prototype in a cloud environment. Finally, we outline an architecture for privacy compliant AI cloud applications and an anonymization tool. With these preliminary results, we describe future work in bridging privacy and AI.