Private Dataset Generation Using Privacy Preserving Collaborative Learning (original) (raw)

No Peek: A Survey of private distributed deep learning

2018

We survey distributed deep learning models for training or inference without accessing raw data from clients. These methods aim to protect confidential patterns in data while still allowing servers to train models. The distributed deep learning methods of federated learning, split learning and large batch stochastic gradient descent are compared in addition to private and secure approaches of differential privacy, homomorphic encryption, oblivious transfer and garbled circuits in the context of neural networks. We study their benefits, limitations and trade-offs with regards to computational resources, data leakage and communication efficiency and also share our anticipated future trends.

A federated deep learning framework for privacy preservation and communication efficiency

Journal of Systems Architecture, 2022

Deep learning has achieved great success in many applications. However, its deployment in practice has been hurdled by two issues: the privacy of data that has to be aggregated centrally for model training and high communication overhead due to transmission of large amount of data usually geographically distributed. Addressing both issues is challenging and most existing works could not provide an efficient solution. In this paper, we develop FedPC, a Federated Deep Learning Framework for Privacy Preservation and Communication Efficiency. The framework allows a model to be learned on multiple private datasets while not revealing any information of training data, even with intermediate data. The framework also minimizes the amount of data exchanged to update the model. We formally prove the convergence of the learning model when training with FedPC and its privacy-preserving property. We perform extensive experiments to evaluate the performance of FedPC in terms of the approximation to the upper-bound performance (when training centrally) and communication overhead. The results show that FedPC maintains the performance approximation of the models within 8.5% of the centrally-trained models when data is distributed to 10 computing nodes. FedPC also reduces the communication overhead by up to 42.20% compared to existing works.

A Federated Learning Framework for Privacy-preserving and Parallel Training

2020

The deployment of such deep learning in practice has been hurdled by two issues: the computational cost of model training and the privacy issue of training data such as medical or healthcare records. The large size of both learning models and datasets incurs a massive computational cost, requiring efficient approaches to speed up the training phase. While parallel and distributed learning can address the issue of computational overhead, preserving the privacy of training data and intermediate results (e.g., gradients) remains a hard problem. Enabling parallel training of deep learning models on distributed datasets while preserving data privacy is even more complex and challenging. In this paper, we develop and implement FEDF, a distributed deep learning framework for privacy-preserving and parallel training. The framework allows a model to be learned on multiple geographically-distributed training datasets (which may belong to different owners) while do not reveal any information o...

Hercules: Boosting the Performance of Privacy-preserving Federated Learning

arXiv (Cornell University), 2022

In this paper, we address the problem of privacy-preserving federated neural network training with N users. We present Hercules, an efficient and high-precision training framework that can tolerate collusion of up to N − 1 users. Hercules follows the POSEIDON framework proposed by Sav et al. (NDSS'21), but makes a qualitative leap in performance with the following contributions: (i) we design a novel parallel homomorphic computation method for matrix operations, which enables fast Single Instruction and Multiple Data (SIMD) operations over ciphertexts. For the multiplication of two h × h dimensional matrices, our method reduces the computation complexity from O(h 3) to O(h). This greatly improves the training efficiency of the neural network since the ciphertext computation is dominated by the convolution operations; (ii) we present an efficient approximation on the sign function based on the composite polynomial approximation. It is used to approximate non-polynomial functions (i.e., ReLU and max), with the optimal asymptotic complexity. Extensive experiments on various benchmark datasets (BCW, ESR, CREDIT, MNIST, SVHN, CIFAR-10 and CIFAR-100) show that compared with POSEIDON, Hercules obtains up to 4% increase in model accuracy, and up to 60× reduction in the computation and communication cost.

Beyond Inferring Class Representatives: User-Level Privacy Leakage From Federated Learning

IEEE INFOCOM 2019 - IEEE Conference on Computer Communications, 2019

Federated learning, i.e., a mobile edge computing framework for deep learning, is a recent advance in privacypreserving machine learning, where the model is trained in a decentralized manner by the clients, i.e., data curators, preventing the server from directly accessing those private data from the clients. This learning mechanism significantly challenges the attack from the server side. Although the state-ofthe-art attacking techniques that incorporated the advance of Generative adversarial networks (GANs) could construct class representatives of the global data distribution among all clients, it is still challenging to distinguishably attack a specific client (i.e., user-level privacy leakage), which is a stronger privacy threat to precisely recover the private data from a specific client. This paper gives the first attempt to explore user-level privacy leakage against the federated learning by the attack from a malicious server. We propose a framework incorporating GAN with a multitask discriminator, which simultaneously discriminates category, reality, and client identity of input samples. The novel discrimination on client identity enables the generator to recover user specified private data. Unlike existing works that tend to interfere the training process of the federated learning, the proposed method works "invisibly" on the server side. The experimental results demonstrate the effectiveness of the proposed attacking approach and the superior to the state-of-the-art. 1 Server Client 1 Client 2 Client 3 Client N

Peer-to-peer Approach for Distributed Privacy-preserving Deep Learning

International Journal of Computer (IJC), 2021

The revolutionary advances in machine learning and Artificial Intelligence have enables people to rethink how we integrate information, analyze data, and use the resulting insights to improve decision making. Deep learning is the most effective, supervised, time and cost efficient machine learning approach which is becoming popular in building today's applications such as self-driving cars, medical diagnosis systems, automatic speech recognition, machine translation, text-to-speech conversion and many others. On the other hand the success of deep learning among others depends on large volume of data available for training the model. Depending on the domain of application, the data needed for training the model may contain sensitive and private information whose privacy needs to be preserved. One of the challenges that need to be address in deep learning is how to ensure that the privacy of training data is preserved without sacrificing the accuracy of the model. In this work, we propose, design and implement a decentralized deep learning system using peer-to-peer architecture that enables multiple data owners to jointly train deep learning models without disclosing their training data to one another and at the same time benefit from each other's dataset through exchanging model parameters during the training. We implemented our approach using two popular deep learning frameworks namely Keras and TensorFlow. We evaluated our approach on two popular datasets in deep learning community namely MNIST and Fashion-MNIST datasets. Using our approach, we were able to train models whose accuracy is relatively close to models trained under privacy-violating setting, while at the same time preserving the privacy of the training data.

FedV: Privacy-Preserving Federated Learning over Vertically Partitioned Data

arXiv (Cornell University), 2021

Federated learning (FL) has been proposed to allow collaborative training of machine learning (ML) models among multiple parties where each party can keep its data private. In this paradigm, only model updates, such as model weights or gradients, are shared. Many existing approaches have focused on horizontal FL, where each party has the entire feature set and labels in the training data set. However, many real scenarios follow a vertically-partitioned FL setup, where a complete feature set is formed only when all the datasets from the parties are combined, and the labels are only available to a single party. Privacy-preserving vertical FL is challenging because complete sets of labels and features are not owned by one entity. Existing approaches for vertical FL require multiple peer-to-peer communications among parties, leading to lengthy training times, and are restricted to (approximated) linear models and just two parties. To close this gap, we propose FedV, a framework for secure gradient computation in vertical settings for several widely used ML models such as linear models, logistic regression, and support vector machines. FedV removes the need for peer-to-peer communication among parties by using functional encryption schemes; this allows FedV to achieve faster training times. It also works for larger and changing sets of parties. We empirically demonstrate the applicability for multiple types of ML models and show a reduction of 10%-70% of training time and 80% to 90% in data transfer with respect to the state-of-the-art approaches. CCS CONCEPTS • Security and privacy → Privacy-preserving protocols; • Computing methodologies → Distributed artificial intelligence; Cooperation and coordination.

Falcon: Honest-Majority Maliciously Secure Framework for Private Deep Learning

Proceedings on Privacy Enhancing Technologies, 2021

We propose Falcon, an end-to-end 3-party protocol for efficient private training and inference of large machine learning models. Falcon presents four main advantages – (i) It is highly expressive with support for high capacity networks such as VGG16 (ii) it supports batch normalization which is important for training complex networks such as AlexNet (iii) Falcon guarantees security with abort against malicious adversaries, assuming an honest majority (iv) Lastly, Falcon presents new theoretical insights for protocol design that make it highly efficient and allow it to outperform existing secure deep learning solutions. Compared to prior art for private inference, we are about 8× faster than SecureNN (PETS’19) on average and comparable to ABY3 (CCS’18). We are about 16 − 200× more communication efficient than either of these. For private training, we are about 6× faster than SecureNN, 4.4× faster than ABY3 and about 2−60× more communication efficient. Our experiments in the WAN setti...

An Empirical Study of Efficiency and Privacy of Federated Learning Algorithms

arXiv (Cornell University), 2023

In today's world, the rapid expansion of IoT networks and the proliferation of smart devices in our daily lives, have resulted in the generation of substantial amounts of heterogeneous data. These data forms a stream which requires special handling. To handle this data effectively, advanced data processing technologies are necessary to guarantee the preservation of both privacy and efficiency. Federated learning emerged as a distributed learning method that trains models locally and aggregates them on a server to preserve data privacy. This paper showcases two illustrative scenarios that highlight the potential of federated learning (FL) as a key to delivering efficient and privacy-preserving machine learning within IoT networks. We first give the mathematical foundations for key aggregation algorithms in federated learning, i.e., FedAvg and FedProx. Then, we conduct simulations, using Flower Framework, to show the efficiency of these algorithms by training deep neural networks on common datasets and show a comparison between the accuracy and loss metrics of FedAvg and FedProx. Then, we present the results highlighting the trade-off between maintaining privacy versus accuracy via simulations-involving the implementation of the differential privacy (DP) method-in Pytorch and Opacus ML frameworks on common FL datasets and data distributions for both FedAvg and FedProx strategies.

Efficient and Private Federated Learning with Partially Trainable Networks

ArXiv, 2021

Federated learning is used for decentralized training of machine learning models on a large number (millions) of edge mobile devices. It is challenging because mobile devices often have limited communication bandwidth and local computation resources. Therefore, improving the efficiency of federated learning is critical for scalability and usability. In this paper, we propose to leverage partially trainable neural networks, which freeze a portion of the model parameters during the entire training process, to reduce the communication cost with little implications on model performance. Through extensive experiments, we empirically show that Federated learning of Partially Trainable neural networks (FedPT) can result in superior communication-accuracy trade-offs, with up to 46× reduction in communication cost, at a small accuracy cost. Our approach also enables faster training, with a smaller memory footprint, and better utility for strong differential privacy guarantees. The proposed F...