The risk of node re-identification in labeled social graphs (original) (raw)

Diversity, Topology, and the Risk of Node Re-identification in Labeled Social Graphs

2018

Real network datasets provide significant benefits for understanding phenomena such as information diffusion or network evolution. Yet the privacy risks raised from sharing real graph datasets, even when stripped of user identity information, are significant. When nodes have associated attributes, the privacy risks increase. In this paper we quantitatively study the impact of binary node attributes on node privacy by employing machine-learning-based re-identification attacks and exploring the interplay between graph topology and attribute placement. Our experiments show that the population's diversity on the binary attribute consistently degrades anonymity.

Diversity, Homophily and the Risk of Node Re-identification in Labeled Social Graphs

Studies in computational intelligence, 2018

Real network datasets provide significant benefits for understanding phenomena such as information diffusion or network evolution. Yet the privacy risks raised from sharing real graph datasets, even when stripped of user identity information, are significant. When nodes have associated attributes, the privacy risks increase. In this paper we quantitatively study the impact of binary node attributes on node privacy by employing machine-learning-based re-identification attacks and exploring the interplay between graph topology and attribute placement. Our experiments show that the population's diversity on the binary attribute consistently degrades anonymity.

Behind the Mask: Understanding the Structural Forces That Make Social Graphs Vulnerable to Deanonymization

IEEE Transactions on Computational Social Systems, 2019

The tradeoff between anonymity and utility in the context of the anonymization of graph data sets is well acknowledged; for better privacy, some of the graph structural properties must be lost. What is not well understood, however, is what forces shape this tradeoff. Specifically, for the data practitioner who wants to publish an anonymized graph data set, it is unclear what graph structural properties can be preserved and what are the anonymity costs associated with preserving them. This article proposes a framework that examines the interplay between graph properties and the vulnerability to deanonymization attacks. We demonstrate its applicability via extensive experiments on thousands of graphs with controlled properties generated from real data sets. In addition, we show empirically that there are structural properties that affect graph vulnerability to reidentification attacks independent of degree distribution.

Delineating social network data anonymization via random edge perturbation

2012

Social network data analysis raises concerns about the privacy of related entities or individuals. To address this issue, organizations can publish data after simply replacing the identities of individuals with pseudonyms, leaving the overall structure of the social net-work unchanged. However, it has been shown that attacks based on structural identification (e.g., a walk-based attack) enable an adversary to re-identify selected individuals in an anonymized network. In this paper we explore the capacity of techniques based on random edge perturbation to thwart such attacks. We theoretically establish that any kind of structural identification attack can effectively be prevented using random edge perturbation and show that, surprisingly, important properties of the whole network, as well as of subgraphs thereof, can be accurately calculated and hence data analysis tasks performed on the perturbed data, given that the legitimate data recipient knows the perturbation probability as we...

Protecting Sensitive Labels in Social Network Data

International journal of computer applications, 2014

Privacy is one of the major concerns when publishing or sharing social network data for social science research and business analysis. Recently, researchers have developed privacy models similar to k-anonymity to prevent node reidentification through structure information. However, even when these privacy models are enforced, an attacker may still be able to infer one's private information if a group of nodes largely share the same sensitive labels (i.e., attributes). In other words, the label-node relationship is not well protected by pure structure anonymization methods. Furthermore, existing approaches, which rely on edge editing or node clustering, may significantly alter key graph properties. In this paper, we define a k-degree-l-diversity anonymity model that considers the protection of structural information as well as sensitive labels of individuals. We had seen a novel anonymization methodology based on adding noise nodes. We implemented that algorithm by adding noise nodes into the original graph with the consideration of introducing the least distortion to graph properties. We here propose novel approach to reduce number of noise node so that decrease the complexity within networks. We implement this protection model in a distributed environment, where different publishers publish their data independently Most importantly, we provide a rigorous analysis of the theoretical bounds on the number of noise nodes added and their impacts on an important graph property. We conduct extensive experiments to evaluate the effectiveness of the proposed technique.

Social Network De-anonymization: More Adversarial Knowledge, More Users Re-Identified

—Following the trend of data trading and data publishing , many online social networks have enabled potentially sensitive data to be exchanged or shared on the web. As a result, users' privacy could be exposed to malicious third parties since they are extremely vulnerable to de-anonymization attacks, i.e., the attacker links the anonymous nodes in the social network to their real identities with the help of background knowledge. Previous work in social network de-anonymization mostly focuses on designing accurate and efficient de-anonymization methods. We study this topic from a different perspective and attempt to investigate the intrinsic relation between the attacker's knowledge and the expected de-anonymization gain. One common intuition is that the more auxiliary information the attacker has, the more accurate de-anonymization becomes. However, their relation is much more sophisticated than that. To simplify the problem, we attempt to quantify background knowledge and de-anonymization gain under several assumptions. Our theoretical analysis and simulations on synthetic and real network data show that more background knowledge may not necessarily lead to more de-anonymization gain in certain cases. Though our analysis is based on a few assumptions, the findings still leave intriguing implications for the attacker to make better use of the background knowledge when performing de-anonymization, and for the data owners to better measure the privacy risk when releasing their data to third parties.

Protecting Sensitive Labels in Social Network Data Anonymization

IEEE Transactions on Knowledge and Data Engineering, 2013

Privacy preservation is the major problem when sharing data's in social networks. Various Privacy models are developed to avoid node reidentification. Yet an attacker may still hack users private information, if nodes largely share the same type of sensitive labels. In this paper, we propose a scheme namely K-degree-L-diversity model useful for preserving social network data. Our anonymization methodology based on addition of noise nodes into the original social graph which reduces error rate and performs better results.

Resisting structural re-identification in anonymized social networks

2008

Abstract We identify privacy risks associated with releasing network data sets and provide an algorithm that mitigates those risks. A network consists of entities connected by links representing relations such as friendship, communication, or shared activity. Maintaining privacy when publishing networked data is uniquely challenging because an individual's network context can be used to identify them even if other identifying information is removed.

Neighbourhood-Pair Attack in Social Network Data Publishing

Vertex re-identification is one of the significant and challenging problems in social network. In this paper, we show a new type of vertex re-identification attack called neighbourhood-pair attack. This attack utilizes the neighbourhood topologies of two connected vertices. We show both theoretically and empirically that this attack is possible on anonymized social network and has higher re-identification rate than the existing structural attacks.

Social Network Privacy for Attribute Disclosure Attacks

Increasing research on social networks stresses the urgency for producing effective means of ensuring user privacy. Represented ubiquitously as graphs, social networks have a myriad of recently developed techniques to prevent identity disclosure, but the equally important attribute disclosure attacks have been neglected. To address this gap, we introduce an approach to anonymize social networks that have labeled nodes, alpha-nearness, which requires that the label distribution in every neighbourhood of the graph be close to that throughout the entire network. We present an effective greedy algorithm to achieve alpha-nearness and experimentally validate the quality of the solutions it derives.