New Directions in Efficient Privacy-Preserving Machine Learning (original) (raw)
Related papers
Privacy-Preserving Machine Learning: Threats and Solutions
IEEE Security & Privacy
For privacy concerns to be addressed adequately in today's machine learning systems, the knowledge gap between the machine learning and privacy communities must be bridged. This article aims to provide an introduction to the intersection of both fields with special emphasis on the techniques used to protect the data.
Privacy-Preserving Machine Learning: Methods, Challenges and Directions
arXiv (Cornell University), 2021
Machine learning (ML) is increasingly being adopted in a wide variety of application domains. Usually, a well-performing ML model relies on a large volume of training data and high-powered computational resources. Such a need for and the use of huge volumes of data raise serious privacy concerns because of the potential risks of leakage of highly privacy-sensitive information; further, the evolving regulatory environments that increasingly restrict access to and use of privacy-sensitive data add significant challenges to fully benefiting from the power of ML for data-driven applications. A trained ML model may also be vulnerable to adversarial attacks such as membership, attribute, or property inference attacks and model inversion attacks. Hence, well-designed privacy-preserving ML (PPML) solutions are critically needed for many emerging applications. Increasingly, significant research efforts from both academia and industry can be seen in PPML areas that aim toward integrating privacy-preserving techniques into ML pipeline or specific algorithms, or designing various PPML architectures. In particular, existing PPML research cross-cut ML, systems and applications design, as well as security and privacy areas; hence, there is a critical need to understand state-of-the-art research, related challenges and a research roadmap for future research in PPML area. In this paper, we systematically review and summarize existing privacy-preserving approaches and propose a Phase, Guarantee, and Utility (PGU) triad based model to understand and guide the evaluation of various PPML solutions by decomposing their privacy-preserving functionalities. We discuss the unique characteristics and challenges of PPML and outline possible research directions that leverage as well as benefit multiple research communities such as ML, distributed systems, security and privacy.
Privacy-Preserving Machine Learning Techniques, Challenges And Research Directions
International Research Journal of Engineering and Technology, 2024
As machine learning models become increasingly ubiquitous, ensuring privacy protection has emerged as a critical concern. This paper presents an in-depth exploration of privacy-preserving machine learning (PPML) techniques, challenges, and future research directions. We delve into the complexities of integrating privacy-preserving methodologies into machine learning algorithms, pipelines, and architectures. Our review highlights the evolving landscape of regulatory frameworks and the pressing need for innovative solutions to mitigate privacy risks. Moreover, we propose a comprehensive framework, the Phase, Guarantee, and Utility (PGU) model, to systematically evaluate PPML solutions, providing a roadmap for researchers and practitioners. By fostering interdisciplinary collaboration among the machine learning, distributed systems, security, and privacy communities, this paper aims to accelerate progress in PPML, paving the way for robust and privacy-preserving machine learning systems.
Privacy-Preserving Machine Learning: Need, Methods, And Research Trends
INTERNATIONAL JOURNAL OF CURRENT SCIENCE, 2022
Privacy-preserving machine learning (PPML) technique is gaining more focus nowadays because most small to medium and large companies are now shifting their data to the public cloud. This opportunity to save the data on the cloud and access it from anywhere at any time will create value, but at the same time, it also invites the threat of data privacy. Most companies use direct ML models to train the data on the public cloud. As a result, data is becoming an asset for companies; Data theft has grown in its capacity to steal other companies' data and have insightful information. This paper mainly focuses on the need for PPML and various tools and techniques used to protect the data. It also explains various open challenges.
Privacy Enhancing Machine Learning via Removal of Unwanted Dependencies
IEEE Transactions on Neural Networks and Learning Systems, 2021
The rapid rise of IoT and Big Data has facilitated copious data driven applications to enhance our quality of life. However, the omnipresent and all-encompassing nature of the data collection can generate privacy concerns. Hence, there is a strong need to develop techniques that ensure the data serve only the intended purposes, giving users control over the information they share. To this end, this paper studies new variants of supervised and adversarial learning methods, which remove the sensitive information in the data before they are sent out for a particular application. The explored methods optimize privacy preserving feature mappings and predictive models simultaneously in an end-to-end fashion. Additionally, the models are built with an emphasis on placing little computational burden on the user side so that the data can be desensitized on device in a cheap manner. Experimental results on mobile sensing and face datasets demonstrate that our models can successfully maintain the utility performances of predictive models while causing sensitive predictions to perform poorly.
Framework for Privacy Preserving Machine Learning: CM-EAM (Collect, Model, Evaluate, Alter & Measure
Framework for Privacy Preserving Machine Learning, 2023
With the widespread use of data sharing, ensuring privacy and safeguarding sensitive information is difficult. The adoption of machine learning algorithms in various domains increased the need for effective anonymization techniques when manipulating critical data. This article shows how to test alteration techniques and analyze the impact on models before scaling up. The objective of the article is to propose a comprehensive framework consisting of five steps for the effective privacy of Machine Learning models in real-world scenarios when security is mandatory. The article aims to address the need for privacy-preserving practices in machine learning by providing practical guidance. The proposed framework serves as a valuable resource for security professionals and data scientists seeking to protect sensitive information while maintaining the utility and effectiveness of the machine learning models.
Privacy-preserving Machine Learning through Data Obfuscation
2018
As machine learning becomes a practice and commodity, numerous cloud-based services and frameworks are provided to help customers develop and deploy machine learning applications. While it is prevalent to outsource model training and serving tasks in the cloud, it is important to protect the privacy of sensitive samples in the training dataset and prevent information leakage to untrusted third parties. Past work have shown that a malicious machine learning service provider or end user can easily extract critical information about the training samples, from the model parameters or even just model outputs. In this paper, we propose a novel and generic methodology to preserve the privacy of training data in machine learning applications. Specifically we introduce an obfuscate function and apply it to the training data before feeding them to the model training task. This function adds random noise to existing samples, or augments the dataset with new samples. By doing so sensitive infor...
Privacy-Preserving Machine Learning Algorithms for Big Data Systems
2015 IEEE 35th International Conference on Distributed Computing Systems, 2015
Machine learning has played an increasing important role in big data systems due to its capability of efficiently discovering valuable knowledge and hidden information. Often times big data such as healthcare systems or financial systems may involve with multiple organizations who may have different privacy policy, and may not explicitly share their data publicly while joint data processing may be a must. Thus, how to share big data among distributed data processing entities while mitigating privacy concerns becomes a challenging problem. Traditional methods rely on cryptographic tools and/or randomization to preserve privacy. Unfortunately, this alone may be inadequate for the emerging big data systems because they are mainly designed for traditional small-scale data sets. In this paper, we propose a novel framework to achieve privacy-preserving machine learning where the training data are distributed and each shared data portion is of large volume. Specifically, we utilize the data locality property of Apache Hadoop architecture and only a limited number of cryptographic operations at the Reduce() procedures to achieve privacy-preservation. We show that the proposed scheme is secure in the semi-honest model and use extensive simulations to demonstrate its scalability and correctness.
Efficient Secure Building Blocks With Application to Privacy Preserving Machine Learning Algorithms
IEEE Access, 2021
Nowadays different entities (such as hospitals, cyber security companies, banks, etc.) collect data of the same nature but often with different statistical properties. It has been shown that if these entities combine their privately collected datasets to train a machine learning model, they would end up with a trained model that often outperforms the human experts of the corresponding field(s) in terms of classification accuracy. However, due to judicial, privacy and cost reasons, no entity is willing to share their data with others. We have the same problem during the classification (inference) stage. Namely, the user doesn't want to reveal any information about his query or its' final classification, while the owner of the trained model wants to keep this model private. In this article we overcome these drawbacks by firstly introducing novel efficient secure building blocks for general purpose, which can also be used to build privacy preserving machine learning algorithms for both training and classification (inference) purposes under strict privacy and security requirements. Our theoretical analysis and experimentation results show that our building blocks (hence also our privacy preserving algorithms which are built on top of them) are more efficient than most (if not all) of the state-of-the-art schemes in terms of computation and communication cost, as well as security characteristics in the semi-honest model. Furthermore, and to the best of our knowledge, for the Naïve Bayes model we extend this efficiency for the first time to also deal with active malicious users, which arbitrarily deviate from the protocol.
Privacy-preserving Machine Learning in Cloud
Proceedings of the 2017 on Cloud Computing Security Workshop, 2017
Machine learning algorithms based on deep neural networks (NN) have achieved remarkable results and are being extensively used in different domains. On the other hand, with increasing growth of cloud services, several Machine Learning as a Service (MLaaS) are offered where training and deploying machine learning models are performed on cloud providers' infrastructure. However, machine learning algorithms require access to raw data which is often privacy sensitive and can create potential security and privacy risks. To address this issue, we develop new techniques to provide solutions for applying deep neural network algorithms to the encrypted data. In this paper, we show that it is feasible and practical to train neural networks using encrypted data and to make encrypted predictions, and also return the predictions in an encrypted form. We demonstrate applicability of the proposed techniques and evaluate its performance. The empirical results show that it provides accurate privacy-preserving training and classification.