Erman Ayday - Academia.edu (original) (raw)

Papers by Erman Ayday

Research paper thumbnail of Privacy-Related Consequences of Turkish Citizen Database Leak

Cornell University - arXiv, May 19, 2016

Personal data is collected and stored more than ever by the governments and companies in the digi... more Personal data is collected and stored more than ever by the governments and companies in the digital age. Even though the data is only released after anonymization, deanonymization is possible by joining different datasets. This puts the privacy of individuals in jeopardy. Furthermore, data leaks can unveil personal identifiers of individuals when security is breached. Processing the leaked dataset can provide even more information than what is visible to naked eye. In this work, we report the results of our analyses on the recent "Turkish citizen database leak", which revealed the national identifier numbers of close to fifty million voters, along with personal information such as date of birth, birth place, and full address. We show that with automated processing of the data, one can uniquely identify (i) mother's maiden name of individuals and (ii) landline numbers, for a significant portion of people. This is a serious privacy and security threat because (i) identity theft risk is now higher, and (ii) scammers are able to access more information about individuals. The only and utmost goal of this work is to point out to the security risks and suggest stricter measures to related companies and agencies to protect the security and privacy of individuals.

Research paper thumbnail of Privacy-Preserving Link Prediction

Cornell University - arXiv, Oct 3, 2022

Consider two data holders, ABC and XYZ, with graph data (e.g., social networks, e-commerce, telec... more Consider two data holders, ABC and XYZ, with graph data (e.g., social networks, e-commerce, telecommunication, and bio-informatics). ABC can see that node A is linked to node B, and XYZ can see node B is linked to node C. Node B is the common neighbour of A and C but neither network can discover this fact on their own. In this paper, we provide a two party computation that ABC and XYZ can run to discover the common neighbours in the union of their graph data, however neither party has to reveal their plaintext graph to the other. Based on private set intersection, we implement our solution, provide measurements, and quantify partial leaks of privacy. We also propose a heavyweight solution that leaks zero information based on additively homomorphic encryption.

Research paper thumbnail of Collusion-Resilient Probabilistic Fingerprinting Scheme for Correlated Data

Cornell University - arXiv, Jan 26, 2020

In order to receive personalized services, individuals share their personal data (e.g., location ... more In order to receive personalized services, individuals share their personal data (e.g., location patterns, healthcare data, or financial data) with a wide range of service providers, hoping that their data will remain confidential. Thus, in case of an unauthorized distribution of their personal data by these service providers (or in case of a data breach) data owners want to identify the source of such data leakage. Digital fingerprinting schemes have been developed to embed a hidden and unique fingerprint into shared digital content, especially multimedia, to provide such liability guarantees. However, existing techniques utilize the high redundancy in the content, which is typically not included in personal data (such as location patters or genomic data). In this work, we propose a probabilistic fingerprinting scheme that efficiently generates the fingerprint by considering a fingerprinting probability (to keep the data utility high) and publicly known inherent correlations between data points. To improve the robustness of the proposed scheme against colluding malicious service providers, we also utilize the Boneh-Shaw fingerprinting codes as a part of the proposed scheme. Furthermore, observing similarities between privacy-preserving data sharing techniques (that add controlled noise to the shared data) and the proposed fingerprinting scheme, we make a first attempt to develop a data sharing scheme that provides both privacy and fingerprint robustness at the same time. We experimentally show that fingerprint robustness and privacy have conflicting objectives and we propose a hybrid approach to control such a trade-off with a design parameter. Using the proposed hybrid approach, we show that individuals can improve their level of privacy by slightly compromising from the fingerprint robustness. We implement and evaluate the performance of the proposed scheme on real genomic data. Our experimental results show the efficiency and robustness of the proposed scheme.

Research paper thumbnail of GenShare: Sharing Accurate Differentially-Private Statistics for Genomic Datasets with Dependent Tuples

Cornell University - arXiv, Dec 30, 2021

Motivation: Cutting the cost of DNA sequencing technology led to a quantum leap in the availabili... more Motivation: Cutting the cost of DNA sequencing technology led to a quantum leap in the availability of genomic data. While sharing genomic data across researchers is an essential driver of advances in health and biomedical research, the sharing process is often infeasible due to data privacy concerns. Differential privacy is one of the rigorous mechanisms utilized to facilitate the sharing of aggregate statistics from genomic datasets without disclosing any private individual-level data. However, differential privacy can still divulge sensitive information about the dataset participants due to the correlation between dataset tuples. Results: Here, we propose GenShare model built upon Laplace-perturbation-mechanism-based DP to introduce a privacy-preserving query-answering sharing model for statistical genomic datasets that include dependency due to the inherent correlations between genomes of individuals (i.e., family ties). We demonstrate our privacy improvement over the state-of-the-art approaches for a range of practical queries including cohort discovery, minor allele frequency, and χ 2 association tests. With a fine-grained analysis of sensitivity in the Laplace perturbation mechanism and considering joint distributions, GenShare results near-achieve the formal privacy guarantees permitted by the theory of differential privacy as the queries that computed over independent tuples (only up to 6% differences). GenShare ensures that query results are as accurate as theoretically guaranteed by differential privacy. For empowering the advances in different scientific and medical research areas, GenShare presents a path toward an interactive genomic data sharing system when the datasets include participants with familial relationships.

Research paper thumbnail of Towards Robust Fingerprinting of Relational Databases by Mitigating Correlation Attacks

IEEE Transactions on Dependable and Secure Computing

Research paper thumbnail of Privacy-Preserving and Efficient Verification of the Outcome in Genome-Wide Association Studies

Proceedings on Privacy Enhancing Technologies

Providing provenance in scientific workflows is essential for reproducibility and auditability pu... more Providing provenance in scientific workflows is essential for reproducibility and auditability purposes. In this work, we propose a framework that verifies the correctness of the aggregate statistics obtained as a result of a genome-wide association study (GWAS) conducted by a researcher while protecting individuals’ privacy in the researcher’s dataset. In GWAS, the goal of the researcher is to identify highly associated point mutations (variants) with a given phenotype. The researcher publishes the workflow of the conducted study, its output, and associated metadata. They keep the research dataset private while providing, as part of the metadata, a partial noisy dataset (that achieves local differential privacy). To check the correctness of the workflow output, a verifier makes use of the workflow, its metadata, and results of another GWAS (conducted using publicly available datasets) to distinguish between correct statistics and incorrect ones. For evaluation, we use real genomic ...

Research paper thumbnail of Method for privacy-preserving medical risk test

A privacy-preserving method for performing a disease susceptibility test on a patient, said metho... more A privacy-preserving method for performing a disease susceptibility test on a patient, said method comprising: (I) performing homomorphic computations, (J) obtaining a test result which is partly decrypted with a first part (prk1 resp. prk2) of a private key, (L) decrypting said partly decrypted result with a second part (prk2 resp. prk1) of said private key, wherein said homomorphic computations are based on encrypted genomic markers of the patient, on encrypted clinical and/or environmental markers, and on encrypted ancestry markers of the patient. The invention is also related to a method for inferring ancestry in the encrypted domain

Research paper thumbnail of A Privacy-Preserving Framework for Conducting Genome-Wide Association Studies Over Outsourced Patient Data

IEEE Transactions on Dependable and Secure Computing

Research paper thumbnail of Genomic Data Sharing under Dependent Local Differential Privacy

Proceedings of the Twelveth ACM Conference on Data and Application Security and Privacy

Privacy-preserving genomic data sharing is prominent to increase the pace of genomic research, an... more Privacy-preserving genomic data sharing is prominent to increase the pace of genomic research, and hence to pave the way towards personalized genomic medicine. In this paper, we introduce (,)-dependent local differential privacy (LDP) for privacy-preserving sharing of correlated data and propose a genomic data sharing mechanism under this privacy definition. We first show that the original definition of LDP is not suitable for genomic data sharing, and then we propose a new mechanism to share genomic data. The proposed mechanism considers the correlations in data during data sharing, eliminates statistically unlikely data values beforehand, and adjusts the probability distributions for each shared data point accordingly. By doing so, we show that we can avoid an attacker from inferring the correct values of the shared data points by utilizing the correlations in the data. By adjusting the probability distributions of the shared states of each data point, we also improve the utility of shared data for the data collector. Furthermore, we develop a greedy algorithm that strategically identifies the processing order of the shared data points with the aim of maximizing the utility of the shared data. Considering the interdependent privacy risks while sharing genomic data, we also analyze the information gain of an attacker about genomes of a donor's family members by observing perturbed data of the genome donor and we propose a mechanism to select the privacy budget (i.e., parameter of LDP) of the donor by also considering privacy preferences of her family members. Our evaluation results on a real-life genomic dataset show the superiority of the proposed mechanism compared to the randomized response mechanism (a widely used technique to achieve LDP).

Research paper thumbnail of ShareTrace: Contact Tracing with the Actor Model

Cornell University - arXiv, Mar 23, 2022

Proximity-based contact tracing relies on mobiledevice interaction to estimate the spread of dise... more Proximity-based contact tracing relies on mobiledevice interaction to estimate the spread of disease. ShareTrace is one such approach that improves the efficacy of tracking disease spread by considering direct and indirect forms of contact. In this work, we utilize the actor model to provide an efficient and scalable formulation of ShareTrace with asynchronous, concurrent message passing on a temporal contact network. We also introduce message reachability, an extension of temporal reachability that accounts for network topology and messagepassing semantics. Our evaluation on both synthetic and realworld contact networks indicates that correct parameter values optimize for algorithmic accuracy and efficiency. In addition, we demonstrate that message reachability can accurately estimate the risk a user poses to their contacts.

Research paper thumbnail of Key Protected Classification for GAN Attack Resilient Collaborative Learning

Research paper thumbnail of Genome Privacy

Research paper thumbnail of Collusion-Secure Watermarking For Sequential Data

In this work, we address the liability issues that may arise due to unauthorized sharing of perso... more In this work, we address the liability issues that may arise due to unauthorized sharing of personal data. We consider a scenario in which an individual shares his sequential data (such as genomic data or location patterns) with several service providers (SPs). In such a scenario, if his data is shared with other third parties without his consent, the individual wants to determine the service provider that is responsible for this unauthorized sharing. To provide this functionality, we propose a novel optimization-based watermarking scheme for sharing of sequential data. Thus, in the case of an unauthorized sharing of sensitive data, the proposed scheme can find the source of the leakage by checking the watermark inside the leaked data. In particular, the proposed schemes guarantees with a high probability that (i) the malicious SP that receives the data cannot understand the watermarked data points, (ii) when more than one malicious SPs aggregate their data, they still cannot determ...

Research paper thumbnail of Real-time privacy risk quantification in online social networks

Proceedings of the 2021 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, 2021

Matching the anonymous profile of an individual in an online social network (OSN) to their real i... more Matching the anonymous profile of an individual in an online social network (OSN) to their real identity raises serious privacy concerns as one can obtain sensitive information about that individual. Previous work has formulated the profile matching risk in several different ways and has shown that there exists a non-negligible risk of matching user profiles across OSNs. However, they are not practical to convey the risk to OSN users in real-time. In this work, using the output of such formulation, we model the profile characteristics of users that are vulnerable to profile matching via machine learning and make probabilistic inferences about how the vulnerabilities of users change as they share new content in OSNs (or as their graph connectivity changes). We evaluate the generated models in real data. Our results show that the generated models determine with high accuracy whether a user profile is vulnerable to profile matching risk by only analyzing their publicly available information in the anonymous OSN. In addition, we develop optimization-based countermeasures to preserve the user's privacy as they share their OSN profile with third parties. We believe that this work will be crucial for OSN users to understand their privacy risks due to their public sharings and be more conscious about their online privacy.

Research paper thumbnail of Introduction and Background Proposed Solution Evaluation

In order to support large-scale genomic studies, an increasing number of medical units (pharmaceu... more In order to support large-scale genomic studies, an increasing number of medical units (pharmaceutical companies or hospitals) are willing to outsource

Research paper thumbnail of Tracking the Invisible: Privacy-Preserving Contact Tracing to Control the Spread of a Virus

Lecture Notes in Computer Science, 2020

Today, tracking and controlling the spread of a virus is a crucial need for almost all countries.... more Today, tracking and controlling the spread of a virus is a crucial need for almost all countries. Doing this early would save millions of lives and help countries keep a stable economy. The easiest way to control the spread of a virus is to immediately inform the individuals who recently had close contact with the diagnosed patients. However, to achieve this, a centralized authority (e.g., a health authority) needs detailed location information from both healthy individuals and diagnosed patients. Thus, such an approach, although beneficial to control the spread of a virus, results in serious privacy concerns, and hence privacy-preserving solutions are required to solve this problem. Previous works on this topic either (i) compromise privacy (especially privacy of diagnosed patients) to have better efficiency or (ii) provide unscalable solutions. In this work, we propose a technique based on private set intersection between physical contact histories of individuals (that are recorded using smart phones) and a centralized database (run by a health authority) that keeps the identities of the positive diagnosed patients for the disease. Proposed solution protects the location privacy of both healthy individuals and diagnosed patients and it guarantees that the identities of the diagnosed patients remain hidden from other individuals. Notably, proposed scheme allows individuals to receive warning messages indicating their previous contacts with a positive diagnosed patient. Such warning messages will help them realize the risk and isolate themselves from other people. We make sure that the warning messages are only observed by the corresponding individuals and not by the health authority. We also implement the proposed scheme and show its efficiency and scalability via simulations.

Research paper thumbnail of Differentially private binary- and matrix-valued data query

Proceedings of the VLDB Endowment, 2021

Differential privacy has been widely adopted to release continuous- and scalar-valued information... more Differential privacy has been widely adopted to release continuous- and scalar-valued information on a database without compromising the privacy of individual data records in it. The problem of querying binary- and matrix-valued information on a database in a differentially private manner has rarely been studied. However, binary- and matrix-valued data are ubiquitous in real-world applications, whose privacy concerns may arise under a variety of circumstances. In this paper, we devise an exclusive or (XOR) mechanism that perturbs binary- and matrix-valued query result by conducting an XOR operation on the query result with calibrated noises attributed to a matrix-valued Bernoulli distribution. We first rigorously analyze the privacy and utility guarantee of the proposed XOR mechanism. Then, to generate the parameters in the matrix-valued Bernoulli distribution, we develop a heuristic approach to minimize the expected square query error rate under ϵ -differential privacy constraint. ...

Research paper thumbnail of Efficient Quantification of Profile Matching Risk in Social Networks Using Belief Propagation

Computer Security – ESORICS 2020, 2020

Many individuals share their opinions (e.g., on political issues) or sensitive information about ... more Many individuals share their opinions (e.g., on political issues) or sensitive information about them (e.g., health status) on the internet in an anonymous way to protect their privacy. However, anonymous data sharing has been becoming more challenging in today's interconnected digital world, especially for individuals that have both anonymous and identified online activities. The most prominent example of such data sharing platforms today are online social networks (OSNs). Many individuals have multiple profiles in different OSNs, including anonymous and identified ones (depending on the nature of the OSN). Here, the privacy threat is profile matching: if an attacker links anonymous profiles of individuals to their real identities, it can obtain privacysensitive information which may have serious consequences, such as discrimination or blackmailing. Therefore, it is very important to quantify and show to the OSN users the extent of this privacy risk. Existing attempts to model profile matching in OSNs are inadequate and computationally inefficient for real-time risk quantification. Thus, in this work, we develop algorithms to efficiently model and quantify profile matching attacks in OSNs as a step towards real-time privacy risk quantification. For this, we model the profile matching problem using a graph and develop a belief propagation (BP)-based algorithm to solve this problem in a significantly more efficient and accurate way compared to the state-of-the-art. We evaluate the proposed framework on three real-life datasets (including data from four different social networks) and show how users' profiles in different OSNs can be matched efficiently and with high probability. We show that the proposed model generation has linear complexity in terms of number of user pairs, which is significantly more efficient than the state-of-the-art (which has cubic complexity). Furthermore, it provides comparable accuracy, precision, and recall compared to state-of-the-art. Thanks to the algorithms that are developed in this work, individuals will be more conscious when sharing data on online platforms. We anticipate that this work will also drive the technology so that new privacy-centered products can be offered by the OSNs.

Research paper thumbnail of Genome Reconstruction Attacks Against Genomic Data-Sharing Beacons

Proceedings on Privacy Enhancing Technologies, 2021

Sharing genome data in a privacy-preserving way stands as a major bottleneck in front of the scie... more Sharing genome data in a privacy-preserving way stands as a major bottleneck in front of the scientific progress promised by the big data era in genomics. A community-driven protocol named genomic data-sharing beacon protocol has been widely adopted for sharing genomic data. The system aims to provide a secure, easy to implement, and standardized interface for data sharing by only allowing yes/no queries on the presence of specific alleles in the dataset. However, beacon protocol was recently shown to be vulnerable against membership inference attacks. In this paper, we show that privacy threats against genomic data sharing beacons are not limited to membership inference. We identify and analyze a novel vulnerability of genomic data-sharing beacons: genome reconstruction. We show that it is possible to successfully reconstruct a substantial part of the genome of a victim when the attacker knows the victim has been added to the beacon in a recent update. In particular, we show how an...

Research paper thumbnail of Tracking and Controlling the Spread of a Virus in a Privacy-Preserving Way

ArXiv, 2020

Today, tracking and controlling the spread of a virus is a crucial need for almost all countries.... more Today, tracking and controlling the spread of a virus is a crucial need for almost all countries. Doing this early would save millions of lives and help countries keep a stable economy. The easiest way to control the spread of a virus is to immediately inform the individuals who recently had close contact with the diagnosed patients. However, to achieve this, a centralized authority (e.g., a health authority) needs detailed location information from both healthy individuals and diagnosed patients. Thus, such an approach, although beneficial to control the spread of a virus, results in serious privacy concerns, and hence privacy-preserving solutions are required to solve this problem. Previous works on this topic either (i) compromise privacy (especially privacy of diagnosed patients) to have better efficiency or (ii) provide unscalable solutions. In this work, we propose a technique based on private set intersection between physical contact histories of individuals (that are recorde...

Research paper thumbnail of Privacy-Related Consequences of Turkish Citizen Database Leak

Cornell University - arXiv, May 19, 2016

Personal data is collected and stored more than ever by the governments and companies in the digi... more Personal data is collected and stored more than ever by the governments and companies in the digital age. Even though the data is only released after anonymization, deanonymization is possible by joining different datasets. This puts the privacy of individuals in jeopardy. Furthermore, data leaks can unveil personal identifiers of individuals when security is breached. Processing the leaked dataset can provide even more information than what is visible to naked eye. In this work, we report the results of our analyses on the recent "Turkish citizen database leak", which revealed the national identifier numbers of close to fifty million voters, along with personal information such as date of birth, birth place, and full address. We show that with automated processing of the data, one can uniquely identify (i) mother's maiden name of individuals and (ii) landline numbers, for a significant portion of people. This is a serious privacy and security threat because (i) identity theft risk is now higher, and (ii) scammers are able to access more information about individuals. The only and utmost goal of this work is to point out to the security risks and suggest stricter measures to related companies and agencies to protect the security and privacy of individuals.

Research paper thumbnail of Privacy-Preserving Link Prediction

Cornell University - arXiv, Oct 3, 2022

Consider two data holders, ABC and XYZ, with graph data (e.g., social networks, e-commerce, telec... more Consider two data holders, ABC and XYZ, with graph data (e.g., social networks, e-commerce, telecommunication, and bio-informatics). ABC can see that node A is linked to node B, and XYZ can see node B is linked to node C. Node B is the common neighbour of A and C but neither network can discover this fact on their own. In this paper, we provide a two party computation that ABC and XYZ can run to discover the common neighbours in the union of their graph data, however neither party has to reveal their plaintext graph to the other. Based on private set intersection, we implement our solution, provide measurements, and quantify partial leaks of privacy. We also propose a heavyweight solution that leaks zero information based on additively homomorphic encryption.

Research paper thumbnail of Collusion-Resilient Probabilistic Fingerprinting Scheme for Correlated Data

Cornell University - arXiv, Jan 26, 2020

In order to receive personalized services, individuals share their personal data (e.g., location ... more In order to receive personalized services, individuals share their personal data (e.g., location patterns, healthcare data, or financial data) with a wide range of service providers, hoping that their data will remain confidential. Thus, in case of an unauthorized distribution of their personal data by these service providers (or in case of a data breach) data owners want to identify the source of such data leakage. Digital fingerprinting schemes have been developed to embed a hidden and unique fingerprint into shared digital content, especially multimedia, to provide such liability guarantees. However, existing techniques utilize the high redundancy in the content, which is typically not included in personal data (such as location patters or genomic data). In this work, we propose a probabilistic fingerprinting scheme that efficiently generates the fingerprint by considering a fingerprinting probability (to keep the data utility high) and publicly known inherent correlations between data points. To improve the robustness of the proposed scheme against colluding malicious service providers, we also utilize the Boneh-Shaw fingerprinting codes as a part of the proposed scheme. Furthermore, observing similarities between privacy-preserving data sharing techniques (that add controlled noise to the shared data) and the proposed fingerprinting scheme, we make a first attempt to develop a data sharing scheme that provides both privacy and fingerprint robustness at the same time. We experimentally show that fingerprint robustness and privacy have conflicting objectives and we propose a hybrid approach to control such a trade-off with a design parameter. Using the proposed hybrid approach, we show that individuals can improve their level of privacy by slightly compromising from the fingerprint robustness. We implement and evaluate the performance of the proposed scheme on real genomic data. Our experimental results show the efficiency and robustness of the proposed scheme.

Research paper thumbnail of GenShare: Sharing Accurate Differentially-Private Statistics for Genomic Datasets with Dependent Tuples

Cornell University - arXiv, Dec 30, 2021

Motivation: Cutting the cost of DNA sequencing technology led to a quantum leap in the availabili... more Motivation: Cutting the cost of DNA sequencing technology led to a quantum leap in the availability of genomic data. While sharing genomic data across researchers is an essential driver of advances in health and biomedical research, the sharing process is often infeasible due to data privacy concerns. Differential privacy is one of the rigorous mechanisms utilized to facilitate the sharing of aggregate statistics from genomic datasets without disclosing any private individual-level data. However, differential privacy can still divulge sensitive information about the dataset participants due to the correlation between dataset tuples. Results: Here, we propose GenShare model built upon Laplace-perturbation-mechanism-based DP to introduce a privacy-preserving query-answering sharing model for statistical genomic datasets that include dependency due to the inherent correlations between genomes of individuals (i.e., family ties). We demonstrate our privacy improvement over the state-of-the-art approaches for a range of practical queries including cohort discovery, minor allele frequency, and χ 2 association tests. With a fine-grained analysis of sensitivity in the Laplace perturbation mechanism and considering joint distributions, GenShare results near-achieve the formal privacy guarantees permitted by the theory of differential privacy as the queries that computed over independent tuples (only up to 6% differences). GenShare ensures that query results are as accurate as theoretically guaranteed by differential privacy. For empowering the advances in different scientific and medical research areas, GenShare presents a path toward an interactive genomic data sharing system when the datasets include participants with familial relationships.

Research paper thumbnail of Towards Robust Fingerprinting of Relational Databases by Mitigating Correlation Attacks

IEEE Transactions on Dependable and Secure Computing

Research paper thumbnail of Privacy-Preserving and Efficient Verification of the Outcome in Genome-Wide Association Studies

Proceedings on Privacy Enhancing Technologies

Providing provenance in scientific workflows is essential for reproducibility and auditability pu... more Providing provenance in scientific workflows is essential for reproducibility and auditability purposes. In this work, we propose a framework that verifies the correctness of the aggregate statistics obtained as a result of a genome-wide association study (GWAS) conducted by a researcher while protecting individuals’ privacy in the researcher’s dataset. In GWAS, the goal of the researcher is to identify highly associated point mutations (variants) with a given phenotype. The researcher publishes the workflow of the conducted study, its output, and associated metadata. They keep the research dataset private while providing, as part of the metadata, a partial noisy dataset (that achieves local differential privacy). To check the correctness of the workflow output, a verifier makes use of the workflow, its metadata, and results of another GWAS (conducted using publicly available datasets) to distinguish between correct statistics and incorrect ones. For evaluation, we use real genomic ...

Research paper thumbnail of Method for privacy-preserving medical risk test

A privacy-preserving method for performing a disease susceptibility test on a patient, said metho... more A privacy-preserving method for performing a disease susceptibility test on a patient, said method comprising: (I) performing homomorphic computations, (J) obtaining a test result which is partly decrypted with a first part (prk1 resp. prk2) of a private key, (L) decrypting said partly decrypted result with a second part (prk2 resp. prk1) of said private key, wherein said homomorphic computations are based on encrypted genomic markers of the patient, on encrypted clinical and/or environmental markers, and on encrypted ancestry markers of the patient. The invention is also related to a method for inferring ancestry in the encrypted domain

Research paper thumbnail of A Privacy-Preserving Framework for Conducting Genome-Wide Association Studies Over Outsourced Patient Data

IEEE Transactions on Dependable and Secure Computing

Research paper thumbnail of Genomic Data Sharing under Dependent Local Differential Privacy

Proceedings of the Twelveth ACM Conference on Data and Application Security and Privacy

Privacy-preserving genomic data sharing is prominent to increase the pace of genomic research, an... more Privacy-preserving genomic data sharing is prominent to increase the pace of genomic research, and hence to pave the way towards personalized genomic medicine. In this paper, we introduce (,)-dependent local differential privacy (LDP) for privacy-preserving sharing of correlated data and propose a genomic data sharing mechanism under this privacy definition. We first show that the original definition of LDP is not suitable for genomic data sharing, and then we propose a new mechanism to share genomic data. The proposed mechanism considers the correlations in data during data sharing, eliminates statistically unlikely data values beforehand, and adjusts the probability distributions for each shared data point accordingly. By doing so, we show that we can avoid an attacker from inferring the correct values of the shared data points by utilizing the correlations in the data. By adjusting the probability distributions of the shared states of each data point, we also improve the utility of shared data for the data collector. Furthermore, we develop a greedy algorithm that strategically identifies the processing order of the shared data points with the aim of maximizing the utility of the shared data. Considering the interdependent privacy risks while sharing genomic data, we also analyze the information gain of an attacker about genomes of a donor's family members by observing perturbed data of the genome donor and we propose a mechanism to select the privacy budget (i.e., parameter of LDP) of the donor by also considering privacy preferences of her family members. Our evaluation results on a real-life genomic dataset show the superiority of the proposed mechanism compared to the randomized response mechanism (a widely used technique to achieve LDP).

Research paper thumbnail of ShareTrace: Contact Tracing with the Actor Model

Cornell University - arXiv, Mar 23, 2022

Proximity-based contact tracing relies on mobiledevice interaction to estimate the spread of dise... more Proximity-based contact tracing relies on mobiledevice interaction to estimate the spread of disease. ShareTrace is one such approach that improves the efficacy of tracking disease spread by considering direct and indirect forms of contact. In this work, we utilize the actor model to provide an efficient and scalable formulation of ShareTrace with asynchronous, concurrent message passing on a temporal contact network. We also introduce message reachability, an extension of temporal reachability that accounts for network topology and messagepassing semantics. Our evaluation on both synthetic and realworld contact networks indicates that correct parameter values optimize for algorithmic accuracy and efficiency. In addition, we demonstrate that message reachability can accurately estimate the risk a user poses to their contacts.

Research paper thumbnail of Key Protected Classification for GAN Attack Resilient Collaborative Learning

Research paper thumbnail of Genome Privacy

Research paper thumbnail of Collusion-Secure Watermarking For Sequential Data

In this work, we address the liability issues that may arise due to unauthorized sharing of perso... more In this work, we address the liability issues that may arise due to unauthorized sharing of personal data. We consider a scenario in which an individual shares his sequential data (such as genomic data or location patterns) with several service providers (SPs). In such a scenario, if his data is shared with other third parties without his consent, the individual wants to determine the service provider that is responsible for this unauthorized sharing. To provide this functionality, we propose a novel optimization-based watermarking scheme for sharing of sequential data. Thus, in the case of an unauthorized sharing of sensitive data, the proposed scheme can find the source of the leakage by checking the watermark inside the leaked data. In particular, the proposed schemes guarantees with a high probability that (i) the malicious SP that receives the data cannot understand the watermarked data points, (ii) when more than one malicious SPs aggregate their data, they still cannot determ...

Research paper thumbnail of Real-time privacy risk quantification in online social networks

Proceedings of the 2021 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, 2021

Matching the anonymous profile of an individual in an online social network (OSN) to their real i... more Matching the anonymous profile of an individual in an online social network (OSN) to their real identity raises serious privacy concerns as one can obtain sensitive information about that individual. Previous work has formulated the profile matching risk in several different ways and has shown that there exists a non-negligible risk of matching user profiles across OSNs. However, they are not practical to convey the risk to OSN users in real-time. In this work, using the output of such formulation, we model the profile characteristics of users that are vulnerable to profile matching via machine learning and make probabilistic inferences about how the vulnerabilities of users change as they share new content in OSNs (or as their graph connectivity changes). We evaluate the generated models in real data. Our results show that the generated models determine with high accuracy whether a user profile is vulnerable to profile matching risk by only analyzing their publicly available information in the anonymous OSN. In addition, we develop optimization-based countermeasures to preserve the user's privacy as they share their OSN profile with third parties. We believe that this work will be crucial for OSN users to understand their privacy risks due to their public sharings and be more conscious about their online privacy.

Research paper thumbnail of Introduction and Background Proposed Solution Evaluation

In order to support large-scale genomic studies, an increasing number of medical units (pharmaceu... more In order to support large-scale genomic studies, an increasing number of medical units (pharmaceutical companies or hospitals) are willing to outsource

Research paper thumbnail of Tracking the Invisible: Privacy-Preserving Contact Tracing to Control the Spread of a Virus

Lecture Notes in Computer Science, 2020

Today, tracking and controlling the spread of a virus is a crucial need for almost all countries.... more Today, tracking and controlling the spread of a virus is a crucial need for almost all countries. Doing this early would save millions of lives and help countries keep a stable economy. The easiest way to control the spread of a virus is to immediately inform the individuals who recently had close contact with the diagnosed patients. However, to achieve this, a centralized authority (e.g., a health authority) needs detailed location information from both healthy individuals and diagnosed patients. Thus, such an approach, although beneficial to control the spread of a virus, results in serious privacy concerns, and hence privacy-preserving solutions are required to solve this problem. Previous works on this topic either (i) compromise privacy (especially privacy of diagnosed patients) to have better efficiency or (ii) provide unscalable solutions. In this work, we propose a technique based on private set intersection between physical contact histories of individuals (that are recorded using smart phones) and a centralized database (run by a health authority) that keeps the identities of the positive diagnosed patients for the disease. Proposed solution protects the location privacy of both healthy individuals and diagnosed patients and it guarantees that the identities of the diagnosed patients remain hidden from other individuals. Notably, proposed scheme allows individuals to receive warning messages indicating their previous contacts with a positive diagnosed patient. Such warning messages will help them realize the risk and isolate themselves from other people. We make sure that the warning messages are only observed by the corresponding individuals and not by the health authority. We also implement the proposed scheme and show its efficiency and scalability via simulations.

Research paper thumbnail of Differentially private binary- and matrix-valued data query

Proceedings of the VLDB Endowment, 2021

Differential privacy has been widely adopted to release continuous- and scalar-valued information... more Differential privacy has been widely adopted to release continuous- and scalar-valued information on a database without compromising the privacy of individual data records in it. The problem of querying binary- and matrix-valued information on a database in a differentially private manner has rarely been studied. However, binary- and matrix-valued data are ubiquitous in real-world applications, whose privacy concerns may arise under a variety of circumstances. In this paper, we devise an exclusive or (XOR) mechanism that perturbs binary- and matrix-valued query result by conducting an XOR operation on the query result with calibrated noises attributed to a matrix-valued Bernoulli distribution. We first rigorously analyze the privacy and utility guarantee of the proposed XOR mechanism. Then, to generate the parameters in the matrix-valued Bernoulli distribution, we develop a heuristic approach to minimize the expected square query error rate under ϵ -differential privacy constraint. ...

Research paper thumbnail of Efficient Quantification of Profile Matching Risk in Social Networks Using Belief Propagation

Computer Security – ESORICS 2020, 2020

Many individuals share their opinions (e.g., on political issues) or sensitive information about ... more Many individuals share their opinions (e.g., on political issues) or sensitive information about them (e.g., health status) on the internet in an anonymous way to protect their privacy. However, anonymous data sharing has been becoming more challenging in today's interconnected digital world, especially for individuals that have both anonymous and identified online activities. The most prominent example of such data sharing platforms today are online social networks (OSNs). Many individuals have multiple profiles in different OSNs, including anonymous and identified ones (depending on the nature of the OSN). Here, the privacy threat is profile matching: if an attacker links anonymous profiles of individuals to their real identities, it can obtain privacysensitive information which may have serious consequences, such as discrimination or blackmailing. Therefore, it is very important to quantify and show to the OSN users the extent of this privacy risk. Existing attempts to model profile matching in OSNs are inadequate and computationally inefficient for real-time risk quantification. Thus, in this work, we develop algorithms to efficiently model and quantify profile matching attacks in OSNs as a step towards real-time privacy risk quantification. For this, we model the profile matching problem using a graph and develop a belief propagation (BP)-based algorithm to solve this problem in a significantly more efficient and accurate way compared to the state-of-the-art. We evaluate the proposed framework on three real-life datasets (including data from four different social networks) and show how users' profiles in different OSNs can be matched efficiently and with high probability. We show that the proposed model generation has linear complexity in terms of number of user pairs, which is significantly more efficient than the state-of-the-art (which has cubic complexity). Furthermore, it provides comparable accuracy, precision, and recall compared to state-of-the-art. Thanks to the algorithms that are developed in this work, individuals will be more conscious when sharing data on online platforms. We anticipate that this work will also drive the technology so that new privacy-centered products can be offered by the OSNs.

Research paper thumbnail of Genome Reconstruction Attacks Against Genomic Data-Sharing Beacons

Proceedings on Privacy Enhancing Technologies, 2021

Sharing genome data in a privacy-preserving way stands as a major bottleneck in front of the scie... more Sharing genome data in a privacy-preserving way stands as a major bottleneck in front of the scientific progress promised by the big data era in genomics. A community-driven protocol named genomic data-sharing beacon protocol has been widely adopted for sharing genomic data. The system aims to provide a secure, easy to implement, and standardized interface for data sharing by only allowing yes/no queries on the presence of specific alleles in the dataset. However, beacon protocol was recently shown to be vulnerable against membership inference attacks. In this paper, we show that privacy threats against genomic data sharing beacons are not limited to membership inference. We identify and analyze a novel vulnerability of genomic data-sharing beacons: genome reconstruction. We show that it is possible to successfully reconstruct a substantial part of the genome of a victim when the attacker knows the victim has been added to the beacon in a recent update. In particular, we show how an...

Research paper thumbnail of Tracking and Controlling the Spread of a Virus in a Privacy-Preserving Way

ArXiv, 2020

Today, tracking and controlling the spread of a virus is a crucial need for almost all countries.... more Today, tracking and controlling the spread of a virus is a crucial need for almost all countries. Doing this early would save millions of lives and help countries keep a stable economy. The easiest way to control the spread of a virus is to immediately inform the individuals who recently had close contact with the diagnosed patients. However, to achieve this, a centralized authority (e.g., a health authority) needs detailed location information from both healthy individuals and diagnosed patients. Thus, such an approach, although beneficial to control the spread of a virus, results in serious privacy concerns, and hence privacy-preserving solutions are required to solve this problem. Previous works on this topic either (i) compromise privacy (especially privacy of diagnosed patients) to have better efficiency or (ii) provide unscalable solutions. In this work, we propose a technique based on private set intersection between physical contact histories of individuals (that are recorde...