A Roadmap for Privacy Preserving Speech Processing (original) (raw)

Privacy preserving encrypted phonetic search of speech data

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

This paper presents a strategy for enabling speech recognition to be performed in the cloud whilst preserving the privacy of users. The approach advocates a demarcation of responsibilities between the client and server-side components for performing the speech recognition task. On the client-side resides the acoustic model, which symbolically encodes the audio and encrypts the data before uploading to the server. The server-side then employs searchable encryption to enable the phonetic search of the speech content. Some preliminary results for speech encoding and searchable encryption are presented.

Privacy-preserving Query-by-Example Speech Search

2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2015

This paper investigates a new privacy-preserving paradigm for the task of Query-by-Example Speech Search using Secure Binary Embeddings, a hashing method that converts vector data to bit strings through a combination of random projections followed by banded quantization. The proposed method allows performing spoken query search in an encrypted domain, by analyzing ciphered information computed from the original recordings. Unlike other hashing techniques, the embeddings allow the computation of the distance between vectors that are close enough, but are not perfect matches. This paper shows how these hashes can be combined with Dynamic Time Warping based on posterior derived features to perform secure speech search. Experiments performed on a sub-set of the Speech-Dat Portuguese corpus showed that the proposed privacy-preserving system obtains similar results to its non-private counterpart.

Prϵϵch: A System for Privacy-Preserving Speech Transcription

2019

New Advances in machine learning and the abundance of speech datasets have made Automated Speech Recognition (ASR) systems, with very high accuracy, a reality. ASR systems offer their users the means to transcribe speech data at scale. Unfortunately, these systems pose serious privacy threats as speech is a rich source of sensitive acoustic and textual information. Although offline ASR eliminates the privacy risks, we find that its transcription performance is inferior to that of cloud-based ASR systems, especially for real-world recordings. In this paper, we propose Prϵϵch, an end-to-end speech transcription system which lies at an intermediate point in the privacy-utility spectrum of speech transcription. It protects the acoustic features of the speakers' voices and protects the privacy of the textual content at an improved performance relative to offline ASR. Prϵϵch relies on cloud-based services to transcribe a speech file after applying a series of privacy-preserving operat...

Pr$\epsilon\epsilon$ch: A System for Privacy-Preserving Speech Transcription

2019

New advances in machine learning and the abundance of speech datasets have made Automated Speech Recognition (ASR) systems, with very high accuracy, a reality. ASR systems offer their users the means to transcribe speech data at scale. Unfortunately, these systems pose serious privacy threats as speech is a rich source of sensitive acoustic and textual information. Although offline ASR eliminates the privacy risks, we find that its transcription performance is inferior to that of cloud-based ASR systems, especially for real-world recordings. In this paper, we propose Prεεch, an end-to-end speech transcription system which lies at an intermediate point in the privacy-utility spectrum of speech transcription. It protects the acoustic features of the speakers’ voices and protects the privacy of the textual content at an improved performance relative to offline ASR. Prεεch relies on cloudbased services to transcribe a speech file after applying a series of privacy-preserving operations ...

Developing a secure voice recognition service on Raspberry Pi

Bulletin of Electrical Engineering and Informatics, 2024

In this study, we present a novel voice recognition service developed on the Raspberry Pi 4 model B platform, leveraging the fast Fourier transform (FFT) for efficient speech-to-digital signal conversion. By integrating the hidden Markov model (HMM) and artificial neural network (ANN), our system accurately reconstructs speech input. We further fortify this service with dual-layer encryption using the Rivest–Shamir–Adleman (RSA) and advanced encryption standard (AES) methods, achieving encryption and decryption times well suited for real-time applications. Our results demonstrate the system's robustness and efficiency: speech processing within 1.2 to 1.9 seconds, RSA 2048-bit encryption in 2 to 6 milliseconds, RSA decryption in 6 to 10 milliseconds, and AES-GCM 256-bit encryption and decryption in approximately 2.6 to 3 seconds.

Towards a Privacy Compliant Cloud Architecture for Natural Language Processing Platforms

Proceedings of the 21st International Conference on Enterprise Information Systems, 2019

Natural language processing in combination with advances in artificial intelligence is on the rise. However, compliance constraints while handling personal data in many types of documents hinder various application scenarios. We describe the challenges of working with personal and particularly sensitive data in practice with three different use cases. We present the anonymization bootstrap challenge in creating a prototype in a cloud environment. Finally, we outline an architecture for privacy compliant AI cloud applications and an anonymization tool. With these preliminary results, we describe future work in bridging privacy and AI.

An Effective and Efficient Technique for Supporting Privacy-Preserving Keyword-Based Search over Encrypted Data in Clouds

Procedia Computer Science, 2020

Nowadays, cloud providers offer to their clients the possibility of storage of emails and files on the cloud server. To avoid privacy concerns, encryption should be applied to data. Unlike searching plaintext documents by keywords, encrypted documents cannot be retrieved in the same manner. As keyword searches on encrypted data are in demand, this paper describes an effective and efficient technique to support privacy-preserving keyword-based search over encrypted outsourced data. With this technique, encrypted data are first searched with the keyword, support for dynamic operations is then checked, and all relevant data documents are finally sorted based on the number of keywords matching the user query. To evaluate the technique, precision and recall are measured. The results reveal the effectiveness and efficiency of the technique in supporting privacy-preserving keyword-based search over encrypted outsourced data.

PRIVACY PRESERVING NATURAL LANGUAGE PROCESSING IN THE CLOUD SUPPORTING SIMILARITY BASED TEXT RETRIEVAL THROUGH BLIND STORAGE

In cloud computing, a fundamental application is to preserve outsourced data in cloud through gateway encryption and blind storage, and to implement multi-keyword ranked search over the encrypted data in a secure way by NLP process .By using NLP (Natural language processing) technique used to search multi keyword in cloud its extract the meaning from Word Net tool. In this paper, we develop the searchable encryption for multi-keyword ranked search over the storage data. Efficient multi-keyword search scheme that can return the ranked search results based on the accuracy. Within this framework, we leverage an efficient index to further improve the search efficiency, and adopt the blind storage system to conceal access pattern of the search user. Security analysis demonstrates that our scheme can achieve confidentiality of documents and index, trapdoor privacy, trapdoor unlinkability, and concealing access pattern of the search user. Finally, using extensive simulations, we show that our proposal can achieve much improved efficiency in terms of authentication and access control compared with the existing proposals.

Privacy Preserving Keyword Search over Cloud Data

2018

With the advent of cloud computing, data owners are motivated to outsource their complex data management systems from local sites to the commercial public cloud for great flexibility and economic savings. But for protecting data privacy, sensitive data has to be encrypted before outsourcing, which obsoletes traditional data utilization based on plaintext keyword search. We present a scheme that discusses secure rank based keyword search over an encrypted cloud data. The data that has to be outsourced is encrypted using symmetric encryption algorithm for data confidentiality. The index file of the keyword set that has to be searched is outsourced to the local trusted server where the keyword set that is generated from the data files is also stored. This is done so that any untrusted server cannot learn about the data with the help of the index formed. The files are listed based on the certain relevance criteria. User requests for the required files to the un-trusted server. The param...

Encrypted Phrase Search in the Cloud Storage

International Journal for Research in Applied Science and Engineering Technology, 2018

This study has been undertaken to investigate solutions to search over encrypted documents stored on cloud servers. There are a number of searchable encryption schemes that allow secure conjunctive keyword searches over encrypted data but not able to achieve much improved storage and computational cost. In this paper, we present a phrase search technique based on Bloom filters that is faster than existing solutions, with better storage and computational cost. Our scheme can be summarized as the use of multiple n-gram Bloom filters, to provide conjunctive keyword search and phrase search.