Erez Shmueli | Tel Aviv University (original) (raw)

Journal Papers by Erez Shmueli

Research paper thumbnail of Are You Your Friends' Friend? Poor Perception of Friendship Ties Limits The Ability to Promote Behavioral Change

Persuasion is at the core of norm creation, emergence of collective action, and solutions to 'tra... more Persuasion is at the core of norm creation, emergence of collective action, and solutions to 'tragedy of the commons' problems. In this paper, we show that the directionality of friendship ties affect the extent to which individuals can influence the behavior of each other. Moreover, we find that people are typically poor at perceiving the directionality of their friendship ties and that this can significantly limit their ability to engage in cooperative arrangements. This could lead to failures in establishing compatible norms, acting together, finding compromise solutions, and persuading others to act. We then suggest strategies to overcome this limitation by using two topological characteristics of the perceived friendship network. The findings of this paper have significant consequences for designing interventions that seek to harness social influence for collective action.

Research paper thumbnail of If It Looks Like a Spammer and Behaves Like a Spammer, It Must Be a Spammer Analysis and Detection of Microblogging Spam Accounts

Spam in Online Social Networks (OSNs) is a sys-temic problem that imposes a threat to these servi... more Spam in Online Social Networks (OSNs) is a sys-temic problem that imposes a threat to these services in terms of undermining their value to advertisers and potential investors , as well as negatively affecting users' engagement. As spammers continuously keep creating newer accounts and evasive techniques upon being caught, a deeper understanding of their spamming strategies is vital to the design of future social media defense mechanisms. In this work, we present a unique analysis of spam accounts in OSNs viewed through the lens of their behavioral characteristics. Our analysis includes over 100 million messages collected from Twitter over the course of one month. We show that there exist two behaviorally distinct categories of spammers and that they employ different spamming strategies. Then, we illustrate how users in these two categories demonstrate different individual properties as well as social interaction patterns. Finally, we analyze the detectability of spam accounts with respect to three categories of features, namely, content attributes, social interactions, and profile properties.

Research paper thumbnail of Behavioral Study of Users When Interacting with Active Honeytokens

Active honeytokens are fake digital data objects planted among real data objects and used in an a... more Active honeytokens are fake digital data objects planted among real data objects and used in an attempt to detect data misuse by insiders. In this paper we are interested in understanding how users (e.g., employees) behave when interacting with honeytokens, specifically addressing the following questions: (1) Can users distinguish genuine data objects from honeytokens? and (2) How does the user's behavior and tendency to misuse data change when he/she is aware of the use of honeytokens? First, we present an automated and generic method for generating the honeytokens that are used in the subsequent behavioral studies. The results of the first study indicate that it is possible to automatically generate honeytokens that are difficult for users to distinguish from real tokens. The results of the second study unexpectedly show that users did not behave differently when informed in advance that honeytokens were planted in the database and that these honeytokens would be monitored to detect illegitimate behavior. These results can inform security system designers about the type of environmental variables that affect people's data misuse behavior and how to generate honeytokens that evade detection.

Research paper thumbnail of The Role of Personality in Shaping Social Networks and Mediating Behavioral Change

In this paper, we exploit different facets of the Friends and Family study to deal with two perso... more In this paper, we exploit different facets of the Friends and Family study to deal with two personality-related tasks of paramount importance for the user modeling and ubiquitous computing fields. 2 Bruno Lepri et al. First, we propose and validate an approach for automatic classification of personality traits based on the ego-networks' structural characteristics. Our classification results show that i) mobile phones-based behavioral data can be superior to survey ones for the purposes of personality classification from structural network properties and ii) particular feature set/network type combinations promise to perform better with given personality traits. Then, we investigate the mediating role played by personality in the context of inducing behavioral change, specifically increasing daily physical activity using social strategies (social comparison and peer pressure). Our results confirm the role played by Extraversion and Neuroticism. Extroverts exposed to a social comparison strategy are positively associated with an increase in physical activity level, while they tend to decrease physical activity level if they are exposed to a peer pressure intervention strategy. Regarding Neuroticism dimension, neurotic people tend to increase their physical daily activity level if they are exposed to a social comparison strategy. Our findings may have implications in designing personality-based behavioral change strategies and suggest to incorporate users' personality models in the implementation of persuasive systems.

Research paper thumbnail of Campaign Optimization through Behavioral Modeling and Mobile Network Analysis

Optimizing the use of available resources is one of the key challenges in activities that consist... more Optimizing the use of available resources is one of the key challenges in activities that consist of interactions with a large number of " target individuals " , with the ultimate goal of "winning" as many of them as possible, such as in marketing, service provision, political campaigns, or homeland security. Typically, the cost of interactions is monotonically increasing such that a method for maximizing the performance of these campaigns is required. In this paper we propose a mathematical model to compute an optimized campaign by automatically determining the number of interacting units and their type, and how they should be allocated to different geographical regions in order to maximize the campaign's performance. We validate our proposed model using real world mobility data.

Research paper thumbnail of Privacy by Diversity in Sequential Releases of Databases

Information Sciences, 2014

We study the problem of privacy preservation in sequential releases of databases. In that scenari... more We study the problem of privacy preservation in sequential releases of databases. In that scenario, several releases of the same table are published over a period of time, where each release contains a different set of the table attributes, as dictated by the purposes of the release. The goal is to protect the private information from adversaries who examine the entire sequential release. That scenario was studied in and was further investigated in . We revisit their privacy definitions, and suggest a significantly stronger adversarial assumption and privacy definition. We then present a sequential anonymization algorithm that achieves -diversity. The algorithm exploits the fact that different releases may include different attributes in order to reduce the information loss that the anonymization entails. Unlike the previous algorithms, ours is perfectly scalable as the runtime to compute the anonymization of each release is independent of the number of previous releases. In addition, we consider here the fully dynamic setting in which the different releases differ in the set of attributes as well as in the set of tuples. The advantages of our approach are demonstrated by extensive experimentation.

Research paper thumbnail of Constrained Obfuscation of Relational Databases

Information Sciences, 2014

The need to share data often conflicts with privacy preservation. Data obfuscation attempts to ov... more The need to share data often conflicts with privacy preservation. Data obfuscation attempts to overcome this conflict by modifying the original data while optimizing both privacy and utility measures. In this paper we introduce the concept of Constrained Obfuscation Problems (COPs) which formulate the task of obfuscating data stored in relational databases. The main idea behind COPs is that many obfuscation scenarios can be modeled as a data generation process which is constrained by a predefined set of rules. We demonstrate the flexibility of the COP definition by modeling several different obfuscation scenarios: Production Data Obfuscation for Application Testing (PDOAT), anonymization of relational data, and anonymization of social networks. We then suggest a general approach for solving COPs by reducing them into a set of Constrained Satisfaction Problems (CSPs). Such reduction enables the employment of the well-studied CSP framework in order to solve a wide range of complex rules. Some of the resulting CSPs may contain a large number of variables, which may make them intractable. In order to overcome such intractability issues, we present two useful heuristics that decompose such large CSPs into smaller tractable sub-CSPs. We also show how the well-known -diversity privacy measure can be incorporated into the COP framework in order to evaluate the privacy level of COP solutions. Finally, we evaluate the new method in terms of privacy, utility and execution time.

Research paper thumbnail of The Strength of the Strongest Ties in Collaborative Problem Solving

Complex problem solving in science, engineering, and business has become a highly collaborative e... more Complex problem solving in science, engineering, and business has become a highly collaborative endeavor. Teams of scientists or engineers collaborate on projects using their social networks to gather new ideas and feedback. Here we bridge the literature on team performance and information networks by studying teams' problem solving abilities as a function of both their within-team networks and their members' extended networks. We show that, while an assigned team's performance is strongly correlated with its networks of expressive and instrumental ties, only the strongest ties in both networks have an effect on performance.

Research paper thumbnail of openPDS: Protecting the Privacy of Metadata through SafeAnswers

The rise of smartphones and web services made possible the large-scale collection of personal met... more The rise of smartphones and web services made possible the large-scale collection of personal metadata. Information about individuals' location, phone call logs, or web-searches, is collected and used intensively by organizations and big data researchers. Metadata has however yet to realize its full potential. Privacy and legal concerns, as well as the lack of technical solutions for personal metadata management is preventing metadata from being shared and reconciled under the control of the individual. This lack of access and control is furthermore fueling growing concerns, as it prevents individuals from understanding and managing the risks associated with the collection and use of their data. Our contribution is two-fold:

Research paper thumbnail of Implementing a Database Encryption Solution, Design and Implementation Issues

In this paper, we analyze and compare five traditional architectures for database encryption. We ... more In this paper, we analyze and compare five traditional architectures for database encryption. We show that existing architectures may provide a high level of security, but have a significant impact on performance and impose major changes to the application layer, or may be transparent to the application layer and provide high performance, but have several fundamental security weaknesses. We suggest a sixth novel architecture that was not considered before. The new architecture is based on placing the encryption module inside the database management software (DBMS), just above the database cache, and using a dedicated technique to encrypt each database value together with its coordinates. These two properties allow our new architecture to achieve a high level of data security while offering enhanced performance and total transparency to the application layer. We also explain how each architecture can be implemented in a commercial, open source DBMS. We evaluate the performance of the various architectures both analytically and through extensive experimentation. Our performance evaluation results demonstrate that in most realistic scenarios, i.e., where only a part of the database content is stored in the database cache, the suggested architecture outperforms the others.

Research paper thumbnail of Sensing, Understanding, and Shaping Social Behavior

An ability to understand social systems through the aid of computational tools is central to the ... more An ability to understand social systems through the aid of computational tools is central to the emerging field of Computational Social Systems. Such understanding can answer epistemological questions on human behavior in a data-driven manner, and provide prescriptive guidelines for persuading humans to undertake certain actions in real world social scenarios. The growing number of works in this sub-field has the potential to impact multiple walks of human life including health, wellness, productivity, mobility, transportation, education, shopping, and sustenance.

Research paper thumbnail of The Social Amplifier – Reaction of Human Communities to Emergencies

This paper develops a methodology to aggregate signals in a network regarding some hidden state o... more This paper develops a methodology to aggregate signals in a network regarding some hidden state of the world. We argue that focusing on edges around hubs will under certain circumstances amplify the faint signals disseminating in a network, allowing for more efficient detection of that hidden state. We apply this method to detecting emergencies in mobile phone data, demonstrating that under a broad range of cases and a constraint in how many edges can be observed at a time, focusing on the egocentric networks around key hubs will be more effective than sampling random edges. We support this conclusion analytically, through simulations, and with analysis of a dataset containing the call log data from a major mobile carrier in a European nation.

Research paper thumbnail of Improving Accuracy of Classification Models Induced from Anonymized Datasets

The performance of classifiers and other data mining models can be significantly enhanced using t... more The performance of classifiers and other data mining models can be significantly enhanced using the large repositories of digital data collected nowadays by public and private organizations. However, the original records stored in those repositories cannot be released to the data miners as they frequently contain sensitive information. The emerging field of Privacy Preserving Data Publishing (PPDP) deals with this important challenge. In this paper, we present NSVDist (Non-homogeneous generalization with Sensitive Value Distributions)a new anonymization algorithm that, given minimal anonymity and diversity parameters along with an information loss measure, issues corresponding non-homogeneous anonymizations where the sensitive attribute is published as frequency distributions over the sensitive domain rather than in the usual form of exact sensitive values. In our experiments with eight datasets and four different classification algorithms, we show that classifiers induced from data generalized by NSVDist tend to be more accurate than classifiers induced using state-of-the-art anonymization algorithms.

Research paper thumbnail of Limiting Disclosure of Sensitive Data in Sequential Releases of Databases

Privacy Preserving Data Publishing (PPDP) is a research field that deals with the development of ... more Privacy Preserving Data Publishing (PPDP) is a research field that deals with the development of methods to enable publishing of data while minimizing distortion, for maintaining usability on one hand, and respecting privacy on the other hand. Sequential release is a scenario of data publishing where multiple releases of the same underlying table are published over a period of time. A violation of privacy, in this case, may emerge from any one of the releases, or as a result of joining information from different releases. Similarly to [37], our privacy definitions limit the ability of an adversary who combines information from all releases, to link values of the quasi-identifiers to sensitive values. We extend the framework that was considered in [37] in three ways: We allow a greater number of releases, we consider the more flexible local recoding model of "cell generalization" (as opposed to the global recoding model of "cut generalization" in [37]), and we include the case where records may be added to the underlying table from time to time. Our extension of the framework requires also to modify the manner in which privacy is evaluated. We show that while [37] based their privacy evaluation on the notion of the Match Join between the releases, it is no longer suitable for the extended framework considered here. We define more restrictive types of join between the published releases (the Full Match Join and the Kernel Match Join) that are more suitable for privacy evaluation in this context. We then present a top-down algorithm for anonymizing sequential releases in the cell generalization model, that is based on our modified privacy evaluations. Our theoretical study is followed by experimentation that demonstrates a staggering improvement in terms of utility due to the adoption of the cell generalization model, and exemplifies the correction in the privacy evaluation as offered by using the Full or Kernel Match Joins instead of the Match Join.

Research paper thumbnail of Database encryption: an overview of contemporary challenges and design considerations

This article describes the major challenges and design considerations pertaining to database encr... more This article describes the major challenges and design considerations pertaining to database encryption. The article first presents an attack model and the main relevant challenges of data security, encryption overhead, key management, and integration footprint. Next, the article reviews related academic work on alternative encryption configurations pertaining to encryption locus; indexing encrypted data; and key management. Finally, the article concludes with a benchmark using the following design criteria: encryption configuration, encryption granularity and keys storage.

Conference Proceedings by Erez Shmueli

Research paper thumbnail of Secure Multi-Party Protocols for Item-Based Collaborative Filtering

Recommender systems have become extremely common in recent years, and are utilized in a variety o... more Recommender systems have become extremely common in recent years, and are utilized in a variety of domains such as movies, music , news, products, restaurants, etc. While a typical recommender system bases its recommendations solely on users' preference data collected by the system itself, the quality of recommendations can signiicantly be improved if several recommender systems (or vendors) share their data. However, such data sharing poses signiicant privacy and security challenges, both to the vendors and the users. In this paper we propose secure protocols for distributed item-based Collaborative Filtering. Our protocols allow to compute both the predicted ratings of items and their predicted rankings, without compromising privacy nor predictions' accuracy. Unlike previous solutions in which the secure protocols are executed solely by the vendors, our protocols assume the existence of a mediator that performs intermediate computations on encrypted data supplied by the vendors. Such a mediated seeing is advantageous over the non-mediated one since it enables each vendor to communicate solely with the mediator. is yields reduced communication costs and it allows each vendor to issue recommendations to its clients without being dependent on the availability and willingness of the other vendors to collaborate.

Research paper thumbnail of Improving Information Spread through a Scheduled Seeding Approach

One highly studied aspect of social networks is the identification of influential nodes that can ... more One highly studied aspect of social networks is the identification of influential nodes that can spread ideas in a highly efficient way. The vast majority of works in this field have investigated the problem of identifying a set of nodes, that if "seeded" simultaneously, would maximize the information spread in the network. Yet, the timing aspect, namely, finding not only which nodes should be seeded but also when to seed them, has not been sufficiently addressed. In this work, we revisit the problem of network seeding and demonstrate by simulations how an approach takes takes into account the timing aspect, can improve the rates of spread by over 23% compared to existing seeding methods. Such an approach has a wide range of applications, especially in cases where the network topology is easily accessible.

Research paper thumbnail of Ride Sharing: A Network Perspective

SBP, 2015

Ride sharing's potential to improve traffic congestion as well as assist in reducing CO2 emission... more Ride sharing's potential to improve traffic congestion as well as assist in reducing CO2 emission and fuel consumption was recently demonstrated by works such as [1]. Furthermore, it was shown that ride sharing can be implemented within a sound economic regime, providing values for all participants (e.g., Uber). Better understanding the utilization of ride sharing can help policy makers and urban planners in modifying existing urban transportation systems to increase their "ride sharing friendliness" as well as in designing new ride sharing oriented ones. In this paper, we study systematically the relationship between properties of the dynamic transportation network (implied by the aggregated rides) and the potential benefit of ride sharing. By analyzing a dataset of over 14 Million taxi trips taken in New York City during January 2013, we predict the potential benefit of ride sharing using topological properties of the rides network only. Such prediction can ease the analysis of urban areas, with respect to the potential efficiency of ride sharing for their inhabitants, without the need to carry out expensive and time consuming surveys, data collection and analysis operations.

Research paper thumbnail of Twitter: Who gets Caught?

WebSci, 2014

is a systemic problem that imposes a threat to these services in terms of undermining their value... more is a systemic problem that imposes a threat to these services in terms of undermining their value to advertisers and potential investors, as well as negatively affecting users' engagement. In this work, we present a unique analysis of spam accounts in OSNs viewed through the lens of their behavioral characteristics (i.e., profile properties and social interactions). Our analysis includes over 100 million tweets collected over the course of one month, generated by approximately 30 million distinct user accounts, of which over 7% are suspended or removed due to abusive behaviors and other violations. We show that there exist two behaviorally distinct categories of twitter spammers and that they employ different spamming strategies. The users in these two categories demonstrate different individual properties as well as social interaction patterns. As the Twitter spammers continuously keep creating newer accounts upon being caught, a behavioral understanding of their spamming behavior will be vital in the design of future social media defense mechanisms.

Research paper thumbnail of Social Information Leakage: Effects of Awareness and Peer Pressure on User Behavior

HCII, 2014

Today, users share large amounts of information about themselves on their online social networks.... more Today, users share large amounts of information about themselves on their online social networks. Besides the intended information, this sharing process often also "leaks" sensitive information about the users -and by proxy -about their peers. This study investigates the effect of awareness about such leakage of information on user behavior. In particular, taking inspiration from "second-hand smoke" campaigns, this study creates "social awareness" campaign where users are reminded of the information they are leaking about themselves and their friends.

Research paper thumbnail of Are You Your Friends' Friend? Poor Perception of Friendship Ties Limits The Ability to Promote Behavioral Change

Persuasion is at the core of norm creation, emergence of collective action, and solutions to 'tra... more Persuasion is at the core of norm creation, emergence of collective action, and solutions to 'tragedy of the commons' problems. In this paper, we show that the directionality of friendship ties affect the extent to which individuals can influence the behavior of each other. Moreover, we find that people are typically poor at perceiving the directionality of their friendship ties and that this can significantly limit their ability to engage in cooperative arrangements. This could lead to failures in establishing compatible norms, acting together, finding compromise solutions, and persuading others to act. We then suggest strategies to overcome this limitation by using two topological characteristics of the perceived friendship network. The findings of this paper have significant consequences for designing interventions that seek to harness social influence for collective action.

Research paper thumbnail of If It Looks Like a Spammer and Behaves Like a Spammer, It Must Be a Spammer Analysis and Detection of Microblogging Spam Accounts

Spam in Online Social Networks (OSNs) is a sys-temic problem that imposes a threat to these servi... more Spam in Online Social Networks (OSNs) is a sys-temic problem that imposes a threat to these services in terms of undermining their value to advertisers and potential investors , as well as negatively affecting users' engagement. As spammers continuously keep creating newer accounts and evasive techniques upon being caught, a deeper understanding of their spamming strategies is vital to the design of future social media defense mechanisms. In this work, we present a unique analysis of spam accounts in OSNs viewed through the lens of their behavioral characteristics. Our analysis includes over 100 million messages collected from Twitter over the course of one month. We show that there exist two behaviorally distinct categories of spammers and that they employ different spamming strategies. Then, we illustrate how users in these two categories demonstrate different individual properties as well as social interaction patterns. Finally, we analyze the detectability of spam accounts with respect to three categories of features, namely, content attributes, social interactions, and profile properties.

Research paper thumbnail of Behavioral Study of Users When Interacting with Active Honeytokens

Active honeytokens are fake digital data objects planted among real data objects and used in an a... more Active honeytokens are fake digital data objects planted among real data objects and used in an attempt to detect data misuse by insiders. In this paper we are interested in understanding how users (e.g., employees) behave when interacting with honeytokens, specifically addressing the following questions: (1) Can users distinguish genuine data objects from honeytokens? and (2) How does the user's behavior and tendency to misuse data change when he/she is aware of the use of honeytokens? First, we present an automated and generic method for generating the honeytokens that are used in the subsequent behavioral studies. The results of the first study indicate that it is possible to automatically generate honeytokens that are difficult for users to distinguish from real tokens. The results of the second study unexpectedly show that users did not behave differently when informed in advance that honeytokens were planted in the database and that these honeytokens would be monitored to detect illegitimate behavior. These results can inform security system designers about the type of environmental variables that affect people's data misuse behavior and how to generate honeytokens that evade detection.

Research paper thumbnail of The Role of Personality in Shaping Social Networks and Mediating Behavioral Change

In this paper, we exploit different facets of the Friends and Family study to deal with two perso... more In this paper, we exploit different facets of the Friends and Family study to deal with two personality-related tasks of paramount importance for the user modeling and ubiquitous computing fields. 2 Bruno Lepri et al. First, we propose and validate an approach for automatic classification of personality traits based on the ego-networks' structural characteristics. Our classification results show that i) mobile phones-based behavioral data can be superior to survey ones for the purposes of personality classification from structural network properties and ii) particular feature set/network type combinations promise to perform better with given personality traits. Then, we investigate the mediating role played by personality in the context of inducing behavioral change, specifically increasing daily physical activity using social strategies (social comparison and peer pressure). Our results confirm the role played by Extraversion and Neuroticism. Extroverts exposed to a social comparison strategy are positively associated with an increase in physical activity level, while they tend to decrease physical activity level if they are exposed to a peer pressure intervention strategy. Regarding Neuroticism dimension, neurotic people tend to increase their physical daily activity level if they are exposed to a social comparison strategy. Our findings may have implications in designing personality-based behavioral change strategies and suggest to incorporate users' personality models in the implementation of persuasive systems.

Research paper thumbnail of Campaign Optimization through Behavioral Modeling and Mobile Network Analysis

Optimizing the use of available resources is one of the key challenges in activities that consist... more Optimizing the use of available resources is one of the key challenges in activities that consist of interactions with a large number of " target individuals " , with the ultimate goal of "winning" as many of them as possible, such as in marketing, service provision, political campaigns, or homeland security. Typically, the cost of interactions is monotonically increasing such that a method for maximizing the performance of these campaigns is required. In this paper we propose a mathematical model to compute an optimized campaign by automatically determining the number of interacting units and their type, and how they should be allocated to different geographical regions in order to maximize the campaign's performance. We validate our proposed model using real world mobility data.

Research paper thumbnail of Privacy by Diversity in Sequential Releases of Databases

Information Sciences, 2014

We study the problem of privacy preservation in sequential releases of databases. In that scenari... more We study the problem of privacy preservation in sequential releases of databases. In that scenario, several releases of the same table are published over a period of time, where each release contains a different set of the table attributes, as dictated by the purposes of the release. The goal is to protect the private information from adversaries who examine the entire sequential release. That scenario was studied in and was further investigated in . We revisit their privacy definitions, and suggest a significantly stronger adversarial assumption and privacy definition. We then present a sequential anonymization algorithm that achieves -diversity. The algorithm exploits the fact that different releases may include different attributes in order to reduce the information loss that the anonymization entails. Unlike the previous algorithms, ours is perfectly scalable as the runtime to compute the anonymization of each release is independent of the number of previous releases. In addition, we consider here the fully dynamic setting in which the different releases differ in the set of attributes as well as in the set of tuples. The advantages of our approach are demonstrated by extensive experimentation.

Research paper thumbnail of Constrained Obfuscation of Relational Databases

Information Sciences, 2014

The need to share data often conflicts with privacy preservation. Data obfuscation attempts to ov... more The need to share data often conflicts with privacy preservation. Data obfuscation attempts to overcome this conflict by modifying the original data while optimizing both privacy and utility measures. In this paper we introduce the concept of Constrained Obfuscation Problems (COPs) which formulate the task of obfuscating data stored in relational databases. The main idea behind COPs is that many obfuscation scenarios can be modeled as a data generation process which is constrained by a predefined set of rules. We demonstrate the flexibility of the COP definition by modeling several different obfuscation scenarios: Production Data Obfuscation for Application Testing (PDOAT), anonymization of relational data, and anonymization of social networks. We then suggest a general approach for solving COPs by reducing them into a set of Constrained Satisfaction Problems (CSPs). Such reduction enables the employment of the well-studied CSP framework in order to solve a wide range of complex rules. Some of the resulting CSPs may contain a large number of variables, which may make them intractable. In order to overcome such intractability issues, we present two useful heuristics that decompose such large CSPs into smaller tractable sub-CSPs. We also show how the well-known -diversity privacy measure can be incorporated into the COP framework in order to evaluate the privacy level of COP solutions. Finally, we evaluate the new method in terms of privacy, utility and execution time.

Research paper thumbnail of The Strength of the Strongest Ties in Collaborative Problem Solving

Complex problem solving in science, engineering, and business has become a highly collaborative e... more Complex problem solving in science, engineering, and business has become a highly collaborative endeavor. Teams of scientists or engineers collaborate on projects using their social networks to gather new ideas and feedback. Here we bridge the literature on team performance and information networks by studying teams' problem solving abilities as a function of both their within-team networks and their members' extended networks. We show that, while an assigned team's performance is strongly correlated with its networks of expressive and instrumental ties, only the strongest ties in both networks have an effect on performance.

Research paper thumbnail of openPDS: Protecting the Privacy of Metadata through SafeAnswers

The rise of smartphones and web services made possible the large-scale collection of personal met... more The rise of smartphones and web services made possible the large-scale collection of personal metadata. Information about individuals' location, phone call logs, or web-searches, is collected and used intensively by organizations and big data researchers. Metadata has however yet to realize its full potential. Privacy and legal concerns, as well as the lack of technical solutions for personal metadata management is preventing metadata from being shared and reconciled under the control of the individual. This lack of access and control is furthermore fueling growing concerns, as it prevents individuals from understanding and managing the risks associated with the collection and use of their data. Our contribution is two-fold:

Research paper thumbnail of Implementing a Database Encryption Solution, Design and Implementation Issues

In this paper, we analyze and compare five traditional architectures for database encryption. We ... more In this paper, we analyze and compare five traditional architectures for database encryption. We show that existing architectures may provide a high level of security, but have a significant impact on performance and impose major changes to the application layer, or may be transparent to the application layer and provide high performance, but have several fundamental security weaknesses. We suggest a sixth novel architecture that was not considered before. The new architecture is based on placing the encryption module inside the database management software (DBMS), just above the database cache, and using a dedicated technique to encrypt each database value together with its coordinates. These two properties allow our new architecture to achieve a high level of data security while offering enhanced performance and total transparency to the application layer. We also explain how each architecture can be implemented in a commercial, open source DBMS. We evaluate the performance of the various architectures both analytically and through extensive experimentation. Our performance evaluation results demonstrate that in most realistic scenarios, i.e., where only a part of the database content is stored in the database cache, the suggested architecture outperforms the others.

Research paper thumbnail of Sensing, Understanding, and Shaping Social Behavior

An ability to understand social systems through the aid of computational tools is central to the ... more An ability to understand social systems through the aid of computational tools is central to the emerging field of Computational Social Systems. Such understanding can answer epistemological questions on human behavior in a data-driven manner, and provide prescriptive guidelines for persuading humans to undertake certain actions in real world social scenarios. The growing number of works in this sub-field has the potential to impact multiple walks of human life including health, wellness, productivity, mobility, transportation, education, shopping, and sustenance.

Research paper thumbnail of The Social Amplifier – Reaction of Human Communities to Emergencies

This paper develops a methodology to aggregate signals in a network regarding some hidden state o... more This paper develops a methodology to aggregate signals in a network regarding some hidden state of the world. We argue that focusing on edges around hubs will under certain circumstances amplify the faint signals disseminating in a network, allowing for more efficient detection of that hidden state. We apply this method to detecting emergencies in mobile phone data, demonstrating that under a broad range of cases and a constraint in how many edges can be observed at a time, focusing on the egocentric networks around key hubs will be more effective than sampling random edges. We support this conclusion analytically, through simulations, and with analysis of a dataset containing the call log data from a major mobile carrier in a European nation.

Research paper thumbnail of Improving Accuracy of Classification Models Induced from Anonymized Datasets

The performance of classifiers and other data mining models can be significantly enhanced using t... more The performance of classifiers and other data mining models can be significantly enhanced using the large repositories of digital data collected nowadays by public and private organizations. However, the original records stored in those repositories cannot be released to the data miners as they frequently contain sensitive information. The emerging field of Privacy Preserving Data Publishing (PPDP) deals with this important challenge. In this paper, we present NSVDist (Non-homogeneous generalization with Sensitive Value Distributions)a new anonymization algorithm that, given minimal anonymity and diversity parameters along with an information loss measure, issues corresponding non-homogeneous anonymizations where the sensitive attribute is published as frequency distributions over the sensitive domain rather than in the usual form of exact sensitive values. In our experiments with eight datasets and four different classification algorithms, we show that classifiers induced from data generalized by NSVDist tend to be more accurate than classifiers induced using state-of-the-art anonymization algorithms.

Research paper thumbnail of Limiting Disclosure of Sensitive Data in Sequential Releases of Databases

Privacy Preserving Data Publishing (PPDP) is a research field that deals with the development of ... more Privacy Preserving Data Publishing (PPDP) is a research field that deals with the development of methods to enable publishing of data while minimizing distortion, for maintaining usability on one hand, and respecting privacy on the other hand. Sequential release is a scenario of data publishing where multiple releases of the same underlying table are published over a period of time. A violation of privacy, in this case, may emerge from any one of the releases, or as a result of joining information from different releases. Similarly to [37], our privacy definitions limit the ability of an adversary who combines information from all releases, to link values of the quasi-identifiers to sensitive values. We extend the framework that was considered in [37] in three ways: We allow a greater number of releases, we consider the more flexible local recoding model of "cell generalization" (as opposed to the global recoding model of "cut generalization" in [37]), and we include the case where records may be added to the underlying table from time to time. Our extension of the framework requires also to modify the manner in which privacy is evaluated. We show that while [37] based their privacy evaluation on the notion of the Match Join between the releases, it is no longer suitable for the extended framework considered here. We define more restrictive types of join between the published releases (the Full Match Join and the Kernel Match Join) that are more suitable for privacy evaluation in this context. We then present a top-down algorithm for anonymizing sequential releases in the cell generalization model, that is based on our modified privacy evaluations. Our theoretical study is followed by experimentation that demonstrates a staggering improvement in terms of utility due to the adoption of the cell generalization model, and exemplifies the correction in the privacy evaluation as offered by using the Full or Kernel Match Joins instead of the Match Join.

Research paper thumbnail of Database encryption: an overview of contemporary challenges and design considerations

This article describes the major challenges and design considerations pertaining to database encr... more This article describes the major challenges and design considerations pertaining to database encryption. The article first presents an attack model and the main relevant challenges of data security, encryption overhead, key management, and integration footprint. Next, the article reviews related academic work on alternative encryption configurations pertaining to encryption locus; indexing encrypted data; and key management. Finally, the article concludes with a benchmark using the following design criteria: encryption configuration, encryption granularity and keys storage.

Research paper thumbnail of Secure Multi-Party Protocols for Item-Based Collaborative Filtering

Recommender systems have become extremely common in recent years, and are utilized in a variety o... more Recommender systems have become extremely common in recent years, and are utilized in a variety of domains such as movies, music , news, products, restaurants, etc. While a typical recommender system bases its recommendations solely on users' preference data collected by the system itself, the quality of recommendations can signiicantly be improved if several recommender systems (or vendors) share their data. However, such data sharing poses signiicant privacy and security challenges, both to the vendors and the users. In this paper we propose secure protocols for distributed item-based Collaborative Filtering. Our protocols allow to compute both the predicted ratings of items and their predicted rankings, without compromising privacy nor predictions' accuracy. Unlike previous solutions in which the secure protocols are executed solely by the vendors, our protocols assume the existence of a mediator that performs intermediate computations on encrypted data supplied by the vendors. Such a mediated seeing is advantageous over the non-mediated one since it enables each vendor to communicate solely with the mediator. is yields reduced communication costs and it allows each vendor to issue recommendations to its clients without being dependent on the availability and willingness of the other vendors to collaborate.

Research paper thumbnail of Improving Information Spread through a Scheduled Seeding Approach

One highly studied aspect of social networks is the identification of influential nodes that can ... more One highly studied aspect of social networks is the identification of influential nodes that can spread ideas in a highly efficient way. The vast majority of works in this field have investigated the problem of identifying a set of nodes, that if "seeded" simultaneously, would maximize the information spread in the network. Yet, the timing aspect, namely, finding not only which nodes should be seeded but also when to seed them, has not been sufficiently addressed. In this work, we revisit the problem of network seeding and demonstrate by simulations how an approach takes takes into account the timing aspect, can improve the rates of spread by over 23% compared to existing seeding methods. Such an approach has a wide range of applications, especially in cases where the network topology is easily accessible.

Research paper thumbnail of Ride Sharing: A Network Perspective

SBP, 2015

Ride sharing's potential to improve traffic congestion as well as assist in reducing CO2 emission... more Ride sharing's potential to improve traffic congestion as well as assist in reducing CO2 emission and fuel consumption was recently demonstrated by works such as [1]. Furthermore, it was shown that ride sharing can be implemented within a sound economic regime, providing values for all participants (e.g., Uber). Better understanding the utilization of ride sharing can help policy makers and urban planners in modifying existing urban transportation systems to increase their "ride sharing friendliness" as well as in designing new ride sharing oriented ones. In this paper, we study systematically the relationship between properties of the dynamic transportation network (implied by the aggregated rides) and the potential benefit of ride sharing. By analyzing a dataset of over 14 Million taxi trips taken in New York City during January 2013, we predict the potential benefit of ride sharing using topological properties of the rides network only. Such prediction can ease the analysis of urban areas, with respect to the potential efficiency of ride sharing for their inhabitants, without the need to carry out expensive and time consuming surveys, data collection and analysis operations.

Research paper thumbnail of Twitter: Who gets Caught?

WebSci, 2014

is a systemic problem that imposes a threat to these services in terms of undermining their value... more is a systemic problem that imposes a threat to these services in terms of undermining their value to advertisers and potential investors, as well as negatively affecting users' engagement. In this work, we present a unique analysis of spam accounts in OSNs viewed through the lens of their behavioral characteristics (i.e., profile properties and social interactions). Our analysis includes over 100 million tweets collected over the course of one month, generated by approximately 30 million distinct user accounts, of which over 7% are suspended or removed due to abusive behaviors and other violations. We show that there exist two behaviorally distinct categories of twitter spammers and that they employ different spamming strategies. The users in these two categories demonstrate different individual properties as well as social interaction patterns. As the Twitter spammers continuously keep creating newer accounts upon being caught, a behavioral understanding of their spamming behavior will be vital in the design of future social media defense mechanisms.

Research paper thumbnail of Social Information Leakage: Effects of Awareness and Peer Pressure on User Behavior

HCII, 2014

Today, users share large amounts of information about themselves on their online social networks.... more Today, users share large amounts of information about themselves on their online social networks. Besides the intended information, this sharing process often also "leaks" sensitive information about the users -and by proxy -about their peers. This study investigates the effect of awareness about such leakage of information on user behavior. In particular, taking inspiration from "second-hand smoke" campaigns, this study creates "social awareness" campaign where users are reminded of the information they are leaking about themselves and their friends.

Research paper thumbnail of Temporal Dynamics of Scale-Free Networks

SBP, 2014

Many social, biological, and technological networks display substantial non-trivial topological f... more Many social, biological, and technological networks display substantial non-trivial topological features. One well-known and much studied feature of such networks is the scale-free power-law distribution of nodes' degrees. Several works further suggest models for generating complex networks which comply with one or more of these topological features. For example, the known Barabasi-Albert "preferential attachment" model tells us how to create scale-free networks.

Research paper thumbnail of Detecting Anomalous Behaviors Using Structural Properties of Social Networks

SBP, 2013

In this paper we discuss the analysis of mobile networks communication patterns in the presence o... more In this paper we discuss the analysis of mobile networks communication patterns in the presence of some anomalous "real world event". We argue that given limited analysis resources (namely, limited number of network edges we can analyze), it is best to select edges that are located around 'hubs' in the network, resulting in an improved ability to detect such events. We demonstrate this method using a dataset containing the call log data of 3 years from a major mobile carrier in a developed European nation.

Research paper thumbnail of Care to Comment? Recommendations for Commenting on News Stories

WWW, 2012

Many websites provide commenting facilities for users to express their opinions or sentiments wit... more Many websites provide commenting facilities for users to express their opinions or sentiments with regards to content items, such as, videos, news stories, blog posts, etc. Previous studies have shown that user comments contain valuable information that can provide insight on Web documents and may be utilized for various tasks. This work presents a model that predicts, for a given user, suitable news stories for commenting. The model achieves encouraging results regarding the ability to connect users with stories they are likely to comment on. This provides grounds for personalized recommendations of stories to users who may want to take part in their discussion. We combine a content-based approach with a collaborative-filtering approach (utilizing users' co-commenting patterns) in a latent factor modeling framework. We experiment with several variations of the model's loss function in order to adjust it to the problem domain. We evaluate the results on two datasets and show that employing co-commenting patterns improves upon using content features alone, even with as few as two available comments per story. Finally, we try to incorporate available social network data into the model. Interestingly, the social data does not lead to substantial performance gains, suggesting that the value of social data for this task is quite negligible.

Research paper thumbnail of Tracking End-Users in Web Databases

NSS, 2011

When a database is accessed via a web application, users usually receive a pooled connection to t... more When a database is accessed via a web application, users usually receive a pooled connection to the database. From a database point of view, such a connection is always established by the same user (i.e. the web application) and specific data on the end user is not available. As a consequence, users' specific transactions cannot be audited and fine-grained access control cannot be enforced at the database level. In this paper we propose a method and a system which provide the ability to track the end users in web databases. The new method can be applied to legacy web applications without requiring any changes in their existing infrastructure. Furthermore, the new users tracking ability provides a basis for native database protection mechanisms, and intrusion detection systems.

Research paper thumbnail of Mining Roles from Web Application Usage Patterns

TrustBus, 2011

Role mining refers to the problem of discovering an optimal set of roles from existing user permi... more Role mining refers to the problem of discovering an optimal set of roles from existing user permissions. In most role mining algorithms, the full set of user-permission assignments (UPA) is given as input. The challenge we are facing in the current paper is mining roles from actual web-application usage information. This information is collected by monitoring the access of users to application during a period of time. We analyze the actual permissions required to access the application in each user's session, and construct a set of user-permission assignments, which result in an incomplete UPA. We propose an algorithm that uses the session permission information to overcome the deficient data. We show by example how each step of the algorithm overcomes by heuristic instances of higher uncertainty. We demonstrate by simulation the efficiency of our algorithm in handling different levels of deficient data.

Research paper thumbnail of Constrained Anonymization of Production Data: A Constraint Satisfaction Problem Approach

SDM, 2010

The use of production data which contains sensitive information in application testing requires t... more The use of production data which contains sensitive information in application testing requires that the production data be anonymized first. The task of anonymizing production data becomes difficult since it usually consists of constraints which must also be satisfied in the anonymized data. We propose a novel approach to anonymize constrained production data based on the concept of constraint satisfaction problems. Due to the generality of the constraint satisfaction framework, our approach can support a wide variety of mandatory integrity constraints as well as constraints which ensure the similarity of the anonymized data to the production data. Our approach decomposes the constrained anonymization problem into independent sub-problems which can be represented and solved as constraint satisfaction problems (CSPs). Since production databases may contain many records that are associated by vertical constraints, the resulting CSPs may become very large. Such CSPs are further decomposed into dependant subproblems that are solved iteratively by applying local modifications to the production data. Simulations on synthetic production databases demonstrate the feasibility of our method.

Research paper thumbnail of An Attentive Digital Signage System

Digital Signage mit Interaktiven Displays Workshop, 2009

The conceptual architecture and prototype presented in this article aims to transform standard di... more The conceptual architecture and prototype presented in this article aims to transform standard digital signage networks to more flexible, customer-attentive advertising systems by rapidly adjusting the content displayed on each signage to online contextual data such as environment (i.e., store location, date, and time) or customer characteristics (i.e., gender, behavior). The proposed architecture encompasses a knowledge discoverer in order to reveal hidden patterns in customers' reaction to advertisements. This mechanism enhances the fit between the content presented on each signage and the interests of individual customers. 1 Gross Rating Point. A standard of advertising impact. The percent of the target market reached multiplied by the exposure frequency.

Research paper thumbnail of Designing Secure Indexes for Encrypted Databases

DBSec, 2005

The conventional way to speedup queries execution is by using indexes. Designing secure indexes f... more The conventional way to speedup queries execution is by using indexes. Designing secure indexes for an encrypted database environment raises the question of how to construct the index so that no information about the database content is exposed. In this paper, the challenges raised when designing a secure index for an encrypted database are outlined; the attacker model is described; possible attacks against secure indexes are discussed; the difficulty posed by multiple users sharing the same index are presented; and the design considerations regarding keys storage and encryption granularity are illustrated. Finally, a secure database-indexing scheme is suggested. In this scheme, protection against information leakage and unauthorized modifications is provided by using encryption, dummy values and pooling. Furthermore, the new scheme supports discretionary access control in a multi-user environment.

Research paper thumbnail of A Structure Preserving Database Encryption Scheme

SDM, 2004

A new simple and efficient database encryption scheme is presented. The new scheme enables encryp... more A new simple and efficient database encryption scheme is presented. The new scheme enables encrypting the entire content of the database without changing its structure. In addition, the scheme suggests how to convert the conventional database index to a secure index on the encrypted database so that the time complexity of all queries is maintained. No one with access to the encrypted database can learn anything about its content without having the encryption key.