Comparative assessment of federated and centralized machine learning (original) (raw)
Related papers
Rabindra Bharati University Journal of Economics, 2024
The importance of data security and privacy is rising in tandem with the need for machine learning models. One potential answer that has recently arisen is federated learning, a decentralised method of machine learning that enables several entities to work together and construct models without disclosing private information. With an emphasis on its privacy-preserving features, this allencompassing examination delves into the concepts, methods, and uses of federated learning.
An Empirical Study of Efficiency and Privacy of Federated Learning Algorithms
arXiv (Cornell University), 2023
In today's world, the rapid expansion of IoT networks and the proliferation of smart devices in our daily lives, have resulted in the generation of substantial amounts of heterogeneous data. These data forms a stream which requires special handling. To handle this data effectively, advanced data processing technologies are necessary to guarantee the preservation of both privacy and efficiency. Federated learning emerged as a distributed learning method that trains models locally and aggregates them on a server to preserve data privacy. This paper showcases two illustrative scenarios that highlight the potential of federated learning (FL) as a key to delivering efficient and privacy-preserving machine learning within IoT networks. We first give the mathematical foundations for key aggregation algorithms in federated learning, i.e., FedAvg and FedProx. Then, we conduct simulations, using Flower Framework, to show the efficiency of these algorithms by training deep neural networks on common datasets and show a comparison between the accuracy and loss metrics of FedAvg and FedProx. Then, we present the results highlighting the trade-off between maintaining privacy versus accuracy via simulations-involving the implementation of the differential privacy (DP) method-in Pytorch and Opacus ML frameworks on common FL datasets and data distributions for both FedAvg and FedProx strategies.
A Federated Learning Framework for Privacy-preserving and Parallel Training
2020
The deployment of such deep learning in practice has been hurdled by two issues: the computational cost of model training and the privacy issue of training data such as medical or healthcare records. The large size of both learning models and datasets incurs a massive computational cost, requiring efficient approaches to speed up the training phase. While parallel and distributed learning can address the issue of computational overhead, preserving the privacy of training data and intermediate results (e.g., gradients) remains a hard problem. Enabling parallel training of deep learning models on distributed datasets while preserving data privacy is even more complex and challenging. In this paper, we develop and implement FEDF, a distributed deep learning framework for privacy-preserving and parallel training. The framework allows a model to be learned on multiple geographically-distributed training datasets (which may belong to different owners) while do not reveal any information o...
The Principle, Communication Efficiency and Privacy Issues of Federated Learning
viXra, 2019
Standard machine learning approaches require a huge amount of training data to be stored centralized in order to feed the learning algorithms. Keeping and using data centralized brings many negative aspects with it. Those aspects can be inefficient communication between the centralized data center and the clients producing the data, privacy issues and quick usability of the profits and results of the training. Google’s new approach, federated learning, on the other hand tackles all these problems. The training data is kept decentralized at the client’s devices while communicating only with small updates of the common model. This method allows for optimizations of communication, keeping the privacy of users involved in the process and providing quick usability of the model’s process. In this paper I will explain how the federated learning principle works. Further on, I will give a small insight on optimization possibilities of communication efficiency as well as on privacy issues inv...
Privacy and Efficiency of Communications in Federated Split Learning
arXiv (Cornell University), 2023
Everyday, large amounts of sensitive data is distributed across mobile phones, wearable devices, and other sensors. Traditionally, these enormous datasets have been processed on a single system, with complex models being trained to make valuable predictions. Distributed machine learning techniques such as Federated and Split Learning have recently been developed to protect user data and privacy better while ensuring high performance. Both of these distributed learning architectures have advantages and disadvantages. In this paper, we examine these tradeoffs and suggest a new hybrid Federated Split Learning architecture that combines the efficiency and privacy benefits of both. Our evaluation demonstrates how our hybrid Federated Split Learning approach can lower the amount of processing power required by each client running a distributed learning system, reduce training and inference time while keeping a similar accuracy. We also discuss the resiliency of our approach to deep learning privacy inference attacks and compare our solution to other recently proposed benchmarks.
Efficient and Private Federated Learning with Partially Trainable Networks
ArXiv, 2021
Federated learning is used for decentralized training of machine learning models on a large number (millions) of edge mobile devices. It is challenging because mobile devices often have limited communication bandwidth and local computation resources. Therefore, improving the efficiency of federated learning is critical for scalability and usability. In this paper, we propose to leverage partially trainable neural networks, which freeze a portion of the model parameters during the entire training process, to reduce the communication cost with little implications on model performance. Through extensive experiments, we empirically show that Federated learning of Partially Trainable neural networks (FedPT) can result in superior communication-accuracy trade-offs, with up to 46× reduction in communication cost, at a small accuracy cost. Our approach also enables faster training, with a smaller memory footprint, and better utility for strong differential privacy guarantees. The proposed F...
Trading Off Privacy, Utility and Efficiency in Federated Learning
ACM Transactions on Intelligent Systems and Technology
Federated learning (FL) enables participating parties to collaboratively build a global model with boosted utility without disclosing private data information. Appropriate protection mechanisms have to be adopted to fulfill the opposing requirements in preserving privacy and maintaining high model utility . In addition, it is a mandate for a federated learning system to achieve high efficiency in order to enable large-scale model training and deployment. We propose a unified federated learning framework that reconciles horizontal and vertical federated learning. Based on this framework, we formulate and quantify the trade-offs between privacy leakage, utility loss, and efficiency reduction, which leads us to the No-Free-Lunch (NFL) theorem for the federated learning system. NFL indicates that it is unrealistic to expect an FL algorithm to simultaneously provide excellent privacy, utility, and efficiency in certain scenarios. We then analyze the lower bounds for the privacy leakage, ...
Papaya: Practical, Private, and Scalable Federated Learning
ArXiv, 2021
Cross-device Federated Learning (FL) is a distributed learning paradigm with several challenges that differentiate it from traditional distributed learning, variability in the system characteristics on each device, and millions of clients coordinating with a central server being primary ones. Most FL systems described in the literature are synchronous – they perform a synchronized aggregation of model updates from individual clients. Scaling synchronous FL is challenging since increasing the number of clients training in parallel leads to diminishing returns in training speed, analogous to large-batch training. Moreover, stragglers hinder synchronous FL training. In this work, we outline a production asynchronous FL system design. Our work tackles the aforementioned issues, sketches of some of the system design challenges and their solutions, and touches upon principles that emerged from building a production FL system for millions of clients. Empirically, we demonstrate that asynch...
Federated Learning: Collaborative Machine Learning without Centralized Training Data
international journal of engineering technology and management sciences
Federated learning (also known as collaborative learning) is a machine learning technique that trains an algorithm without transferring data samples across numerous decentralized edge devices or servers. This strategy differs from standard centralized machine learning techniques in which all local datasets are uploaded to a single server, as well as more traditional decentralized alternatives, which frequently presume that local data samples are uniformly distributed. Federated learning allows several actors to collaborate on the development of a single, robust machine learning model without sharing data, allowing crucial issues such as data privacy, data security, data access rights, and access to heterogeneous data to be addressed. Defence, telecommunications, internet of things, and pharmaceutical industries are just a few of the sectors where it has applications.
FedPerf: A Practitioners’ Guide to Performance of Federated Learning Algorithms
Federated Learning (FL) enables the edge devices to collaboratively train a joint model without sharing their local data. This decentralised and distributed approach improves user privacy, security, and trust. Different variants of FL algorithms have presented promising results on both IID and skewed Non-IID data. However, the performance of FL algorithms is found to be sensitive to the FL system parameters and hyperparameters of the used model. In practice, tuning the right set of parameter settings for an FL algorithm is an expensive task. In this preregister paper, we propose an empirical investigation on five prominent FL algorithms to discover the relation between the FL System Parameters (FLSPs) and their performance. The FLSPs adds extra complexity to FL algorithms over a traditional ML system. We hypothesise that choosing the best FL algorithm for the given FLSP is not a trivial problem. Further, we endeavour to formulate a single easy-to-use metric which can describe the pe...