Synthetic Learning: Learn From Distributed Asynchronized Discriminator GAN Without Sharing Medical Image Data (original) (raw)

Multi-modal AsynDGAN: Learn From Distributed Medical Image Data without Sharing Private Information

2020

As deep learning technologies advance, increasingly more data is necessary to generate general and robust models for various tasks. In the medical domain, however, large-scale and multi-parties data training and analyses are infeasible due to the privacy and data security concerns. In this paper, we propose an extendable and elastic learning framework to preserve privacy and security while enabling collaborative learning with efficient communication. The proposed framework is named distributed Asynchronized Discriminator Generative Adversarial Networks (AsynDGAN), which consists of a centralized generator and multiple distributed discriminators. The advantages of our proposed framework are five-fold: 1) the central generator could learn the real data distribution from multiple datasets implicitly without sharing the image data; 2) the framework is applicable for single-modality or multi-modality data; 3) the learned generator can be used to synthesize samples for down-stream learnin...

A Deep Learning Approach to Private Data Sharing of Medical Images Using Conditional GANs

ArXiv, 2021

Sharing data from clinical studies can facilitate innovative data-driven research and ultimately lead to better public health. However, sharing biomedical data can put sensitive personal information at risk. This is usually solved by anonymization, which is a slow and expensive process. An alternative to anonymization is sharing a synthetic dataset that bears a behaviour similar to the real data but preserves privacy. As part of the collaboration between Novartis and the Oxford Big Data Institute, we generate a synthetic dataset based on COSENTYX® (secukinumab) Ankylosing Spondylitis clinical study. We apply an Auxiliary Classifier GAN to generate synthetic MRIs of vertebral units. The images are conditioned on the VU location (cervical, thoracic and lumbar). In this paper, we present a method for generating a synthetic dataset and conduct an in-depth analysis on its properties along three key metrics: image fidelity, sample diversity and dataset privacy.

Comparison of Privacy-Preserving Distributed Deep Learning Methods in Healthcare

Medical Image Understanding and Analysis

In this paper, we compare three privacy-preserving distributed learning techniques: federated learning, split learning, and SplitFed. We use these techniques to develop binary classification models for detecting tuberculosis from chest X-rays and compare them in terms of classification performance, communication and computational costs, and training time. We propose a novel distributed learning architecture called SplitFedv3, which performs better than split learning and SplitFedv2 in our experiments. We also propose alternate mini-batch training, a new training technique for split learning, that performs better than alternate client training, where clients take turns to train a model.

MD-GAN: Multi-Discriminator Generative Adversarial Networks for Distributed Datasets

2019 IEEE International Parallel and Distributed Processing Symposium (IPDPS), 2019

A recent technical breakthrough in the domain of machine learning is the discovery and the multiple applications of Generative Adversarial Networks (GANs). Those generative models are computationally demanding, as a GAN is composed of two deep neural networks, and because it trains on large datasets. A GAN is generally trained on a single server. In this paper, we address the problem of distributing GANs so that they are able to train over datasets that are spread on multiple workers. MD-GAN is exposed as the first solution for this problem: we propose a novel learning procedure for GANs so that they fit this distributed setup. We then compare the performance of MD-GAN to an adapted version of Federated Learning to GANs, using the MNIST and CIFAR10 datasets. MD-GAN exhibits a reduction by a factor of two of the learning complexity on each worker node, while providing better performances than federated learning on both datasets. We finally discuss the practical implications of distributing GANs.

A Privacy Preserved Image-to-Image Translation Model in MRI: Distributed Learning of WGANs

2019

In this project, we introduce a distributed training approach for Generative Adversarial Networks (GANs) on Magnetic Resonance Imaging (MRI) tasks. In our distributed framework, we have n discrimnator and a single generator. We first generate fake images via the generator, which are fed to the discriminator. In addition to the fake images, we uniformly distribute the real images into n discriminators. Each discriminator first computes a gradient using the local data revealed to itself. Then, we update a global parameter via the average of the computed gradients. With this approach, since we distribute both the data and processing to several discriminators, we reduce the computational complexity and storage demands for each discriminator. Moreover, we preserve the privacy thanks to our novel training aproach that only needs local data. 1 Motivation & Related Work In recent years, Generative Adversarial Networks (GAN) have been widely used in Magnetic Resonance Imaging (MRI) tasks suc...

A New Distributed Method for Training Generative Adversarial Networks

ArXiv, 2021

Generative adversarial networks (GANs) are emerging machine learning models for generating synthesized data similar to real data by jointly training a generator and a discriminator. In many applications, data and computational resources are distributed over many devices, so centralized computation with all data in one location is infeasible due to privacy and/or communication constraints. This paper proposes a new framework for training GANs in a distributed fashion: Each device computes a local discriminator using local data; a single server aggregates their results and computes a global GAN. Specifically, in each iteration, the server sends the global GAN to the devices, which then update their local discriminators; the devices send their results to the server, which then computes their average as the global discriminator and updates the global generator accordingly. Two different update schedules are designed with different levels of parallelism between the devices and the server...

Federated Synthetic Learning from Multi-institutional and Heterogeneous Medical Data

2021

Statistically and information-wise adequate data plays a critical role in training a robust deep learning model. However, collecting sufficient medical data to train a centralized model is still challenging due to various constraints such as privacy regulations and security. In this work, we develop a novel privacy-preserving federated-discriminator GAN, named FedD-GAN, that can learn and synthesize high-quality and various medical images regardless of their type, from heterogeneous datasets residing in multiple data centers whose data cannot be transferred or shared. We trained and evaluated FedD-GAN on three essential classes of medical data, each involving different types of medical images: cardiac CTA, brain MRI, and histopathology. We show that the synthesized images using our method have better quality than using a standard federated learning method and are realistic and accurate enough to train accurate segmentation models in downstream tasks. The segmentation model trained o...

Differentially private synthetic medical data generation using convolutional GANs

Information Sciences, 2022

Deep learning models have demonstrated superior performance in several real-world application problems such as image classification and speech processing. However, creating these models in sensitive domains like healthcare typically requires addressing certain privacy challenges that bring unique concerns. One effective way to handle such private data concerns is to generate realistic synthetic data that can provide practically acceptable data quality as well as be used to improve model performance. To tackle this challenge, we develop a differentially private framework for synthetic data generation using Rényi differential privacy. Our approach builds on convolutional autoencoders and convolutional generative adversarial networks to preserve critical characteristics of the generated synthetic data. In addition, our model can capture the temporal information and feature correlations present in the original data. We demonstrate that our model outperforms existing state-of-the-art models under the same privacy budget using several publicly available benchmark medical datasets in both supervised and unsupervised settings. The source code of this work is available at https://github.com/astorfi/differentially-private-cgan.

Peer-to-peer Approach for Distributed Privacy-preserving Deep Learning

International Journal of Computer (IJC), 2021

The revolutionary advances in machine learning and Artificial Intelligence have enables people to rethink how we integrate information, analyze data, and use the resulting insights to improve decision making. Deep learning is the most effective, supervised, time and cost efficient machine learning approach which is becoming popular in building today's applications such as self-driving cars, medical diagnosis systems, automatic speech recognition, machine translation, text-to-speech conversion and many others. On the other hand the success of deep learning among others depends on large volume of data available for training the model. Depending on the domain of application, the data needed for training the model may contain sensitive and private information whose privacy needs to be preserved. One of the challenges that need to be address in deep learning is how to ensure that the privacy of training data is preserved without sacrificing the accuracy of the model. In this work, we propose, design and implement a decentralized deep learning system using peer-to-peer architecture that enables multiple data owners to jointly train deep learning models without disclosing their training data to one another and at the same time benefit from each other's dataset through exchanging model parameters during the training. We implemented our approach using two popular deep learning frameworks namely Keras and TensorFlow. We evaluated our approach on two popular datasets in deep learning community namely MNIST and Fashion-MNIST datasets. Using our approach, we were able to train models whose accuracy is relatively close to models trained under privacy-violating setting, while at the same time preserving the privacy of the training data.

Protecting GANs against privacy attacks by preventing overfitting

ArXiv, 2020

Generative Adversarial Networks (GANs) have made releasing of synthetic images a viable approach to share data without releasing the original dataset. It has been shown that such synthetic data can be used for a variety of downstream tasks such as training classifiers that would otherwise require the original dataset to be shared. However, recent work has shown that the GAN models and their synthetically generated data can be used to infer the training set membership by an adversary who has access to the entire dataset and some auxiliary information. Current approaches to mitigate this problem (such as DPGAN) lead to dramatically poorer generated sample quality than the original non–private GANs. Here we develop a new GAN architecture (privGAN), where the generator is trained not only to cheat the discriminator but also to defend membership inference attacks. The new mechanism provides protection against this mode of attack while leading to negligible loss in downstream performances...