An Empirical Study of Efficiency and Privacy of Federated Learning Algorithms (original) (raw)

A federated deep learning framework for privacy preservation and communication efficiency

Journal of Systems Architecture, 2022

Deep learning has achieved great success in many applications. However, its deployment in practice has been hurdled by two issues: the privacy of data that has to be aggregated centrally for model training and high communication overhead due to transmission of large amount of data usually geographically distributed. Addressing both issues is challenging and most existing works could not provide an efficient solution. In this paper, we develop FedPC, a Federated Deep Learning Framework for Privacy Preservation and Communication Efficiency. The framework allows a model to be learned on multiple private datasets while not revealing any information of training data, even with intermediate data. The framework also minimizes the amount of data exchanged to update the model. We formally prove the convergence of the learning model when training with FedPC and its privacy-preserving property. We perform extensive experiments to evaluate the performance of FedPC in terms of the approximation to the upper-bound performance (when training centrally) and communication overhead. The results show that FedPC maintains the performance approximation of the models within 8.5% of the centrally-trained models when data is distributed to 10 computing nodes. FedPC also reduces the communication overhead by up to 42.20% compared to existing works.

Federated Learning and Its Role in the Privacy Preservation of IoT Devices

Future Internet, 2022

Federated learning (FL) is a cutting-edge artificial intelligence approach. It is a decentralized problem-solving technique that allows users to train using massive data. Unprocessed information is stored in advanced technology by a secret confidentiality service, which incorporates machine learning (ML) training while removing data connections. As researchers in the field promote ML configurations containing a large amount of private data, systems and infrastructure must be developed to improve the effectiveness of advanced learning systems. This study examines FL in-depth, focusing on application and system platforms, mechanisms, real-world applications, and process contexts. FL creates robust classifiers without requiring information disclosure, resulting in highly secure privacy policies and access control privileges. The article begins with an overview of FL. Then, we examine technical data in FL, enabling innovation, contracts, and software. Compared with other review articles, our goal is to provide a more comprehensive explanation of the best procedure systems and authentic FL software to enable scientists to create the best privacy preservation solutions for IoT devices. We also provide an overview of similar scientific papers and a detailed analysis of the significant difficulties encountered in recent publications. Furthermore, we investigate the benefits and drawbacks of FL and highlight comprehensive distribution scenarios to demonstrate how specific FL models could be implemented to achieve the desired results.

A Federated Learning Framework for Privacy-preserving and Parallel Training

2020

The deployment of such deep learning in practice has been hurdled by two issues: the computational cost of model training and the privacy issue of training data such as medical or healthcare records. The large size of both learning models and datasets incurs a massive computational cost, requiring efficient approaches to speed up the training phase. While parallel and distributed learning can address the issue of computational overhead, preserving the privacy of training data and intermediate results (e.g., gradients) remains a hard problem. Enabling parallel training of deep learning models on distributed datasets while preserving data privacy is even more complex and challenging. In this paper, we develop and implement FEDF, a distributed deep learning framework for privacy-preserving and parallel training. The framework allows a model to be learned on multiple geographically-distributed training datasets (which may belong to different owners) while do not reveal any information o...

FEDERATED LEARNING: TRAINING ML MODELS COLLABORATIVELY ACROSS MULTIPLE DEVICES WITHOUT SHARING RAW DATA, PRESERVING PRIVACY

Rabindra Bharati University Journal of Economics, 2024

The importance of data security and privacy is rising in tandem with the need for machine learning models. One potential answer that has recently arisen is federated learning, a decentralised method of machine learning that enables several entities to work together and construct models without disclosing private information. With an emphasis on its privacy-preserving features, this allencompassing examination delves into the concepts, methods, and uses of federated learning.

Efficient Privacy-Aware Federated Learning by Elimination of Downstream Redundancy

IEEE Design & Test, 2021

Federated Learning is a distributed machine learning paradigm which advocates training on decentralized data. However, developing a model centrally involves huge communication/ computation overhead, and presents a bottleneck. We propose a method that overcomes this problem while maintaining the privacy of the participants and the classification accuracy. Our method achieves significant speedups compared to existing methods that employ Homomorphic Encryption. Even pessimistically, we achieve a speedup of 4.81x for classification on the ImageNet dataset with an AlexNet architecture, without compromising the privacy of the participants and the accuracy.

SURVEY ON FEDERATED LEARNING TOWARDS PRIVACY PRESERVING AI

One of the significant challenges of Artificial Intelligence (AI) and Machine learning models is to preserve data privacy and to ensure data security. Addressing this problem lead to the application of Federated Learning (FL) mechanism towards preserving data privacy. Preserving user privacy in the European Union (EU) has to abide by the General Data Protection Regulation (GDPR). Therefore, exploring the machine learning models for preserving data privacy has to take into consideration of GDPR. In this paper, we present in detail understanding of Federated Machine Learning, various federated architectures along with different privacy-preserving mechanisms. The main goal of this survey work is to highlight the existing privacy techniques and also propose applications of Federated Learning in Industries. Finally, we also depict how Federated Learning is an emerging area of future research that would bring a new era in AI and Machine learning.

Federated Learning with Privacy-preserving and Model IP-right-protection

Machine Intelligence Research

In the past decades, artificial intelligence (AI) has achieved unprecedented success, where statistical models become the central entity in AI. However, the centralized training and inference paradigm for building and using these models is facing more and more privacy and legal challenges. To bridge the gap between data privacy and the need for data fusion, an emerging AI paradigm federated learning (FL) has emerged as an approach for solving data silos and data privacy problems. Based on secure distributed AI, federated learning emphasizes data security throughout the lifecycle, which includes the following steps: data preprocessing, training, evaluation, and deployments. FL keeps data security by using methods, such as secure multi-party computation (MPC), differential privacy, and hardware solutions, to build and use distributed multiple-party machine-learning systems and statistical models over different data sources. Besides data privacy concerns, we argue that the concept of “...

Hercules: Boosting the Performance of Privacy-preserving Federated Learning

arXiv (Cornell University), 2022

In this paper, we address the problem of privacy-preserving federated neural network training with N users. We present Hercules, an efficient and high-precision training framework that can tolerate collusion of up to N − 1 users. Hercules follows the POSEIDON framework proposed by Sav et al. (NDSS'21), but makes a qualitative leap in performance with the following contributions: (i) we design a novel parallel homomorphic computation method for matrix operations, which enables fast Single Instruction and Multiple Data (SIMD) operations over ciphertexts. For the multiplication of two h × h dimensional matrices, our method reduces the computation complexity from O(h 3) to O(h). This greatly improves the training efficiency of the neural network since the ciphertext computation is dominated by the convolution operations; (ii) we present an efficient approximation on the sign function based on the composite polynomial approximation. It is used to approximate non-polynomial functions (i.e., ReLU and max), with the optimal asymptotic complexity. Extensive experiments on various benchmark datasets (BCW, ESR, CREDIT, MNIST, SVHN, CIFAR-10 and CIFAR-100) show that compared with POSEIDON, Hercules obtains up to 4% increase in model accuracy, and up to 60× reduction in the computation and communication cost.

Differentially Private Federated Learning for Bandwidth and Energy Constrained Environments. (Apprentissage fédéré avec confidentialité différentielle pour les environnements contraints en bande passante et énergie)

2021

Machine Learning models are increasingly used in our daily life. For instance, these models can be used for content recommendation during a purchase or to help doctors while making medical decisions,etc. However, to obtain accurate and useful models, we generally need to train the models with large amount of data. Therefore, several entities with limited datasets may want to collaborate in order to improve their local model accuracy. In traditional machine learning, such collaboration requires to first store all entities' data on a centralized server before training the model on it. Such data centralization might be problematic when the data are sensitive and data privacy is required. Instead of sharing the training data, Federated Learning shares the model parameters between a server, which plays the role of aggregator, and the participating entities. More specifically, the server sends at each round the global model to some participants (downstream). These participants then update the received model with their local data and sends back the updated gradients' vector to the server (upstream). The server then aggregates all the participants' updates to obtain the new global model. This operation is repeated until the global model converges. Although Federated Learning improves both the privacy and the accuracy, it is not perfect. In fact, sharing gradients computed by individual parties can leak information about their private training data. Several recent attacks have demonstrated that a sufficiently skilled adversary, who can capture the model updates (gradients) sent by individual parties, can infer whether a specific record or a group property is present in the dataset of a specific party. Moreover, complete training samples can also be reconstructed purely from the captured gradients. Furthermore, Federated Learning is not only vulnerable to privacy attacks, it is also vulnerable to poisoning attacks which can drastically decrease the model accuracy. Finally, Federated Learning incurs large communication costs during upstream/downstream exchanges between the server and the parties. This can be problematic for applications based on bandwidth and energy-constrained devices, as it is the case for mobile systems, for instance. In this thesis, we first propose three bandwidth efficient schemes to reduce the bandwidth costs up to 99.9%. We then propose differentially private extensions of these schemes which are robust against honest-butcurious adversaries (server or participants) and protect the complete dataset of each participant (participant-level privacy). Moreover, our private solutions outperform standard privacy-preserving Federated Learning schemes in terms of accuracy and/or bandwidth efficiency. Finally, we investigate the robustness of our schemes against security attacks performed by malicious participants and discuss a possible privacy-robustness tradeoff which may spur further research.

Privacy and Efficiency of Communications in Federated Split Learning

arXiv (Cornell University), 2023

Everyday, large amounts of sensitive data is distributed across mobile phones, wearable devices, and other sensors. Traditionally, these enormous datasets have been processed on a single system, with complex models being trained to make valuable predictions. Distributed machine learning techniques such as Federated and Split Learning have recently been developed to protect user data and privacy better while ensuring high performance. Both of these distributed learning architectures have advantages and disadvantages. In this paper, we examine these tradeoffs and suggest a new hybrid Federated Split Learning architecture that combines the efficiency and privacy benefits of both. Our evaluation demonstrates how our hybrid Federated Split Learning approach can lower the amount of processing power required by each client running a distributed learning system, reduce training and inference time while keeping a similar accuracy. We also discuss the resiliency of our approach to deep learning privacy inference attacks and compare our solution to other recently proposed benchmarks.