Privacy-Preserving Machine Learning: Need, Methods, And Research Trends (original) (raw)

Privacy-Preserving Machine Learning: Threats and Solutions

IEEE Security & Privacy

For privacy concerns to be addressed adequately in today's machine learning systems, the knowledge gap between the machine learning and privacy communities must be bridged. This article aims to provide an introduction to the intersection of both fields with special emphasis on the techniques used to protect the data.

Privacy-Preserving Machine Learning: Methods, Challenges and Directions

arXiv (Cornell University), 2021

Machine learning (ML) is increasingly being adopted in a wide variety of application domains. Usually, a well-performing ML model relies on a large volume of training data and high-powered computational resources. Such a need for and the use of huge volumes of data raise serious privacy concerns because of the potential risks of leakage of highly privacy-sensitive information; further, the evolving regulatory environments that increasingly restrict access to and use of privacy-sensitive data add significant challenges to fully benefiting from the power of ML for data-driven applications. A trained ML model may also be vulnerable to adversarial attacks such as membership, attribute, or property inference attacks and model inversion attacks. Hence, well-designed privacy-preserving ML (PPML) solutions are critically needed for many emerging applications. Increasingly, significant research efforts from both academia and industry can be seen in PPML areas that aim toward integrating privacy-preserving techniques into ML pipeline or specific algorithms, or designing various PPML architectures. In particular, existing PPML research cross-cut ML, systems and applications design, as well as security and privacy areas; hence, there is a critical need to understand state-of-the-art research, related challenges and a research roadmap for future research in PPML area. In this paper, we systematically review and summarize existing privacy-preserving approaches and propose a Phase, Guarantee, and Utility (PGU) triad based model to understand and guide the evaluation of various PPML solutions by decomposing their privacy-preserving functionalities. We discuss the unique characteristics and challenges of PPML and outline possible research directions that leverage as well as benefit multiple research communities such as ML, distributed systems, security and privacy.

Privacy-Preserving Machine Learning Techniques, Challenges And Research Directions

International Research Journal of Engineering and Technology, 2024

As machine learning models become increasingly ubiquitous, ensuring privacy protection has emerged as a critical concern. This paper presents an in-depth exploration of privacy-preserving machine learning (PPML) techniques, challenges, and future research directions. We delve into the complexities of integrating privacy-preserving methodologies into machine learning algorithms, pipelines, and architectures. Our review highlights the evolving landscape of regulatory frameworks and the pressing need for innovative solutions to mitigate privacy risks. Moreover, we propose a comprehensive framework, the Phase, Guarantee, and Utility (PGU) model, to systematically evaluate PPML solutions, providing a roadmap for researchers and practitioners. By fostering interdisciplinary collaboration among the machine learning, distributed systems, security, and privacy communities, this paper aims to accelerate progress in PPML, paving the way for robust and privacy-preserving machine learning systems.

Privacy-preserving Machine Learning in Cloud

Proceedings of the 2017 on Cloud Computing Security Workshop, 2017

Machine learning algorithms based on deep neural networks (NN) have achieved remarkable results and are being extensively used in different domains. On the other hand, with increasing growth of cloud services, several Machine Learning as a Service (MLaaS) are offered where training and deploying machine learning models are performed on cloud providers' infrastructure. However, machine learning algorithms require access to raw data which is often privacy sensitive and can create potential security and privacy risks. To address this issue, we develop new techniques to provide solutions for applying deep neural network algorithms to the encrypted data. In this paper, we show that it is feasible and practical to train neural networks using encrypted data and to make encrypted predictions, and also return the predictions in an encrypted form. We demonstrate applicability of the proposed techniques and evaluate its performance. The empirical results show that it provides accurate privacy-preserving training and classification.

New Directions in Efficient Privacy-Preserving Machine Learning

2020

Applications of machine learning have become increasingly common in recent years. For instance, navigation systems like Google Maps use machine learning to better predict traffic patterns; Facebook, LinkedIn, and other social media platforms use machine learning to customize user's news feeds. Central to all these systems is user data. However, the sensitive nature of the collected data has also led to a number of privacy concerns. Privacy-preserving machine learning enables systems that can

Framework for Privacy Preserving Machine Learning: CM-EAM (Collect, Model, Evaluate, Alter & Measure

Framework for Privacy Preserving Machine Learning, 2023

With the widespread use of data sharing, ensuring privacy and safeguarding sensitive information is difficult. The adoption of machine learning algorithms in various domains increased the need for effective anonymization techniques when manipulating critical data. This article shows how to test alteration techniques and analyze the impact on models before scaling up. The objective of the article is to propose a comprehensive framework consisting of five steps for the effective privacy of Machine Learning models in real-world scenarios when security is mandatory. The article aims to address the need for privacy-preserving practices in machine learning by providing practical guidance. The proposed framework serves as a valuable resource for security professionals and data scientists seeking to protect sensitive information while maintaining the utility and effectiveness of the machine learning models.

Privacy-preserving Machine Learning through Data Obfuscation

2018

As machine learning becomes a practice and commodity, numerous cloud-based services and frameworks are provided to help customers develop and deploy machine learning applications. While it is prevalent to outsource model training and serving tasks in the cloud, it is important to protect the privacy of sensitive samples in the training dataset and prevent information leakage to untrusted third parties. Past work have shown that a malicious machine learning service provider or end user can easily extract critical information about the training samples, from the model parameters or even just model outputs. In this paper, we propose a novel and generic methodology to preserve the privacy of training data in machine learning applications. Specifically we introduce an obfuscate function and apply it to the training data before feeding them to the model training task. This function adds random noise to existing samples, or augments the dataset with new samples. By doing so sensitive infor...

PRIVACY PRESERVING MACHINE LEARNING CHALLENGES AND SOLUTION APPROACH FOR TRAINING DATA IN ERP SYSTEMS

IAEME PUBLICATION, 2020

The digital transformation is ubiquitous and pushing the case further for ERP companies to incorporate more machine learning algorithms in order to drive intelligent real-time decision-making capabilities. ERP systems have started incorporating machine learning use cases powered by the huge enterprise data, cloud and compute capabilities. However, privacy of data remains a challenge. Data privacy is at the core of a machine learning model that is trained on sensitive information. Not just for profit businesses, but even academic endeavors in the field of medicine cannot progress if they can’t access sensitive medical information in a privacy preserved format. Ramifications of applying a ML model without even fully understanding what is happening inside its hidden layers can be disastrous and the resulting perils can lead to legal consequences. Therefore, Privacy preserving AI techniques started evolving in last few years. The privacy preserving AI field is still growing and there is an understanding gap in organizations and individuals, which makes privacy breach or compromise a pervasive business challenge. This paper focuses on what are key challenges for ERP companies as far as Training machine learning models on their enterprise data is concerned. And how can these challenges be overcome by applying data anonymization and differential privacy techniques.

Privacy-Preserving Machine Learning Algorithms for Big Data Systems

2015 IEEE 35th International Conference on Distributed Computing Systems, 2015

Machine learning has played an increasing important role in big data systems due to its capability of efficiently discovering valuable knowledge and hidden information. Often times big data such as healthcare systems or financial systems may involve with multiple organizations who may have different privacy policy, and may not explicitly share their data publicly while joint data processing may be a must. Thus, how to share big data among distributed data processing entities while mitigating privacy concerns becomes a challenging problem. Traditional methods rely on cryptographic tools and/or randomization to preserve privacy. Unfortunately, this alone may be inadequate for the emerging big data systems because they are mainly designed for traditional small-scale data sets. In this paper, we propose a novel framework to achieve privacy-preserving machine learning where the training data are distributed and each shared data portion is of large volume. Specifically, we utilize the data locality property of Apache Hadoop architecture and only a limited number of cryptographic operations at the Reduce() procedures to achieve privacy-preservation. We show that the proposed scheme is secure in the semi-honest model and use extensive simulations to demonstrate its scalability and correctness.

Efficient Secure Building Blocks With Application to Privacy Preserving Machine Learning Algorithms

IEEE Access, 2021

Nowadays different entities (such as hospitals, cyber security companies, banks, etc.) collect data of the same nature but often with different statistical properties. It has been shown that if these entities combine their privately collected datasets to train a machine learning model, they would end up with a trained model that often outperforms the human experts of the corresponding field(s) in terms of classification accuracy. However, due to judicial, privacy and cost reasons, no entity is willing to share their data with others. We have the same problem during the classification (inference) stage. Namely, the user doesn't want to reveal any information about his query or its' final classification, while the owner of the trained model wants to keep this model private. In this article we overcome these drawbacks by firstly introducing novel efficient secure building blocks for general purpose, which can also be used to build privacy preserving machine learning algorithms for both training and classification (inference) purposes under strict privacy and security requirements. Our theoretical analysis and experimentation results show that our building blocks (hence also our privacy preserving algorithms which are built on top of them) are more efficient than most (if not all) of the state-of-the-art schemes in terms of computation and communication cost, as well as security characteristics in the semi-honest model. Furthermore, and to the best of our knowledge, for the Naïve Bayes model we extend this efficiency for the first time to also deal with active malicious users, which arbitrarily deviate from the protocol.