Privacy-Preserving Machine Learning Algorithms for Big Data Systems (original) (raw)

Efficient Secure Building Blocks With Application to Privacy Preserving Machine Learning Algorithms

IEEE Access, 2021

Nowadays different entities (such as hospitals, cyber security companies, banks, etc.) collect data of the same nature but often with different statistical properties. It has been shown that if these entities combine their privately collected datasets to train a machine learning model, they would end up with a trained model that often outperforms the human experts of the corresponding field(s) in terms of classification accuracy. However, due to judicial, privacy and cost reasons, no entity is willing to share their data with others. We have the same problem during the classification (inference) stage. Namely, the user doesn't want to reveal any information about his query or its' final classification, while the owner of the trained model wants to keep this model private. In this article we overcome these drawbacks by firstly introducing novel efficient secure building blocks for general purpose, which can also be used to build privacy preserving machine learning algorithms for both training and classification (inference) purposes under strict privacy and security requirements. Our theoretical analysis and experimentation results show that our building blocks (hence also our privacy preserving algorithms which are built on top of them) are more efficient than most (if not all) of the state-of-the-art schemes in terms of computation and communication cost, as well as security characteristics in the semi-honest model. Furthermore, and to the best of our knowledge, for the Naïve Bayes model we extend this efficiency for the first time to also deal with active malicious users, which arbitrarily deviate from the protocol.

Privacy-preserving Machine Learning in Cloud

Proceedings of the 2017 on Cloud Computing Security Workshop, 2017

Machine learning algorithms based on deep neural networks (NN) have achieved remarkable results and are being extensively used in different domains. On the other hand, with increasing growth of cloud services, several Machine Learning as a Service (MLaaS) are offered where training and deploying machine learning models are performed on cloud providers' infrastructure. However, machine learning algorithms require access to raw data which is often privacy sensitive and can create potential security and privacy risks. To address this issue, we develop new techniques to provide solutions for applying deep neural network algorithms to the encrypted data. In this paper, we show that it is feasible and practical to train neural networks using encrypted data and to make encrypted predictions, and also return the predictions in an encrypted form. We demonstrate applicability of the proposed techniques and evaluate its performance. The empirical results show that it provides accurate privacy-preserving training and classification.

A Secure Collaborative Machine Learning Framework Based on Data Locality

2015 IEEE Global Communications Conference (GLOBECOM), 2014

Advancements in big data analysis offer costeffective opportunities to improve decision-making in numerous areas such as health care, economic productivity, crime, and resource management. Nowadays, data holders are tending to sharing their data for better outcomes from their aggregated data. However, the current tools and technologies developed to manage big data are often not designed to incorporate adequate security or privacy measures during data sharing. In this paper, we consider a scenario where multiple data holders intend to find predictive models from their joint data without revealing their own data to each other. Data locality property is used as an alternative to multi-party computation (SMC) techniques. Specifically, we distribute the centralized learning task to each data holder as local learning tasks in a way that local learning is only related to local data. Along with that, we propose an efficient and secure protocol to reassemble local results to get the final result. Correctness of our scheme is proved theoretically and numerically. Security analysis is conducted from the aspect of information theory.

New Directions in Efficient Privacy-Preserving Machine Learning

2020

Applications of machine learning have become increasingly common in recent years. For instance, navigation systems like Google Maps use machine learning to better predict traffic patterns; Facebook, LinkedIn, and other social media platforms use machine learning to customize user's news feeds. Central to all these systems is user data. However, the sensitive nature of the collected data has also led to a number of privacy concerns. Privacy-preserving machine learning enables systems that can

Privacy-Preserving Big Data Analytics: From Theory to Practice

2019

In the last decade, with the advent of Internet of Things (IoT) and Big Data phenomenons, data security and privacy have become very crucial issues. A significant portion of the problem is due to not utilizing appropriate security and privacy measures in data and computational infrastructures. Secure multiparty computation (secure MPC) is a cryptographic tool that can be used to deal with the mentioned problems. This computational approach has attracted increasing attention, and there has been significant amount of advancement in this domain. In this paper, we review the important theoretical bases and practical advancements of secure multiparty computation. In particular, we briefly review three common cryptographic primitives used in secure MPC and highlight the main arithmetic operations that are performed at the core of secure MPC protocols. We also highlight the strengths and weaknesses of different secure MPC approaches as well as the fundamental challenges in this domain. Mor...

An Efficient Outsourced Privacy Preserving Machine Learning Scheme With Public Verifiability

IEEE Access, 2019

Cloud computing has been widely applied in numerous applications for storage and data analytics tasks. However, cloud servers engaged through a third party cannot be fully trusted by multiple data users. Thus, security and privacy concerns become the main obstructions to use machine learning services, especially with multiple data providers. Additionally, some recent outsourcing machine learning schemes have been proposed in order to preserve the privacy of data providers. Yet, these schemes cannot satisfy the property of public verifiability. In this paper, we present an efficient privacy-preserving machine learning scheme for multiple data providers. The proposed scheme allows all participants in the system model to publicly verify the correctness of the encrypted data. Furthermore, a unidirectional proxy re-encryption (UPRE) scheme is employed to reduce the high computational costs along with multiple data providers. The cloud server embeds noise in the encrypted data, allowing the analytics to apply machine learning techniques and preserve the privacy of data providers' information. The results and experiments tests demonstrate that the proposed scheme has the ability to reduce computational costs and communication overheads.

Decentralized Machine Learning Models with Cryptographic Techniques

Everything from medical screening to disease outbreak detection could benefit from machine learning models trained on sensitive real-world data. And, thanks to the widespread use of mobile devices, even more detailedand sensitive-information is becoming available. Traditional machine learning, on the other hand, involves a data pipeline that uses a central server (on-premises or in the cloud) to host the trained model and make predictions. Distributed Machine Learning (FL), on the other hand, is a method of downloading the current model and computing an updated model using local data at the device itself (a.k.a. edge computing).These locally trained models are then sent back to the central server, where they are aggregated (i.e. weights are averaged), and a single consolidated and improved global model is then sent back to the devices. The interaction of parameters and the resulting model, however, may still reveal information about the training data used. Two approaches have been used in this report to address these privacy concerns, which are based on Homographic Encryption and Secret Sharing techniques, among others. The report summarises previous research in these areas and makes recommendations for future research.

Towards Privacy Preserving Big Data Analytics

Big data is large and complex data that is difficult to process by traditional data processing systems. Big data an-alytics is a process to generate knowledge from large datasets, having variety of data, which is collected from multiple sources, using platforms such as, high performance computing clusters, hadoop, spark, etc. Due to data collection from multiple sources, chances of privacy breach have increased. It is difficult to apply existing privacy models (privacy preserving techniques) in big data analytics because of 3Vs: Volume (large amount of data), Variety (structured, semistructured or un-structured data), and Velocity (fast generation and processing of data), characteristics of big data. This paper discusses about general architecture of big data analytics that shows different stages of big data analytics, which can be helpful to identify the stage, where privacy models can be applied. Based on survey of existing privacy models, a summery has been prepared, that shows relation between privacy models and 3Vs of big data.

Privacy-Preserving Machine Learning: Need, Methods, And Research Trends

INTERNATIONAL JOURNAL OF CURRENT SCIENCE, 2022

Privacy-preserving machine learning (PPML) technique is gaining more focus nowadays because most small to medium and large companies are now shifting their data to the public cloud. This opportunity to save the data on the cloud and access it from anywhere at any time will create value, but at the same time, it also invites the threat of data privacy. Most companies use direct ML models to train the data on the public cloud. As a result, data is becoming an asset for companies; Data theft has grown in its capacity to steal other companies' data and have insightful information. This paper mainly focuses on the need for PPML and various tools and techniques used to protect the data. It also explains various open challenges.

Privacy-Preserving Data Mining and Analytics in Big Data Environments

The exponential growth of Big Data has revolutionized numerous industries by enabling the extraction of valuable insights from vast and diverse datasets. However, this advancement is accompanied by significant privacy and security challenges that impede the full potential of data analytics. Privacy-Preserving Data Mining (PPDM) emerges as a critical approach to mitigate these challenges, ensuring individual privacy while maintaining data utility. This paper presents a comprehensive survey of state-of-the-art PPDM methodologies within Big Data environments, encompassing privacy models, data transformation techniques, privacy-preserving machine learning algorithms, and privacy economics. Through an extensive literature review and analysis of real-world applications in healthcare and finance, we identify key challenges and gaps in current practices. Additionally, we propose a cohesive privacy framework aimed at guiding researchers and practitioners in implementing robust privacypreserving mechanisms. The study also explores emerging trends such as advanced cryptographic techniques, privacy-preserving query processing, and the integration of privacy in machine learning. By addressing the balance between data utility and privacy, this research contributes to the advancement of ethical and secure Big Data analytics, paving the way for future innovations and interdisciplinary collaborations in the field.