Michael Fire | Ben Gurion University (original) (raw)
Journal Articles by Michael Fire
Nature Humanities and Social Sciences Communications, 2020
Data science can offer answers to a wide range of social science questions. Here we turn attentio... more Data science can offer answers to a wide range of social science questions. Here we turn attention to the portrayal of women in movies, an industry that has a significant influence on society, impacting such aspects of life as self-esteem and career choice. To this end, we fused data from the online movie database IMDb with a dataset of movie dialogue subtitles to create the largest available corpus of movie social networks (15,540 networks). Analyzing this data, we investigated gender bias in on-screen female characters over the past century. We find a trend of improvement in all aspects of women‘s roles in movies, including a constant rise in the centrality of female characters. There has also been an increase in the number of movies that pass the well-known Bechdel test, a popular—albeit flawed—measure of women in fiction. Here we propose a new and better alternative to this test for evaluating female roles in movies. Our study introduces fresh data, an open-code framework, and novel techniques that present new opportunities in the research and analysis of movies
GigaScience, 2020
Background COVID-19 is the most rapidly expanding coronavirus outbreak in the past 2 decades. To ... more Background
COVID-19 is the most rapidly expanding coronavirus outbreak in the past 2 decades. To provide a swift response to a novel outbreak, prior knowledge from similar outbreaks is essential.
Results
Here, we study the volume of research conducted on previous coronavirus outbreaks, specifically SARS and MERS, relative to other infectious diseases by analyzing >35 million articles from the past 20 years. Our results demonstrate that previous coronavirus outbreaks have been understudied compared with other viruses. We also show that the research volume of emerging infectious diseases is very high after an outbreak and decreases drastically upon the containment of the disease. This can yield inadequate research and limited investment in gaining a full understanding of novel coronavirus management and prevention.
Conclusions
Independent of the outcome of the current COVID-19 outbreak, we believe that measures should be taken to encourage sustained research in the field.
GigaScience, 2019
Abstract Background The academic publishing world is changing significantly, with ever-growing nu... more Abstract
Background
The academic publishing world is changing significantly, with ever-growing numbers of publications each year and shifting publishing patterns. However, the metrics used to measure academic success, such as the number of publications, citation number, and impact factor, have not changed for decades. Moreover, recent studies indicate that these metrics have become targets and follow Goodhart’s Law, according to which, “when a measure becomes a target, it ceases to be a good measure.”
Results
In this study, we analyzed >120 million papers to examine how the academic publishing world has evolved over the last century, with a deeper look into the specific field of biology. Our study shows that the validity of citation-based measures is being compromised and their usefulness is lessening. In particular, the number of publications has ceased to be a good metric as a result of longer author lists, shorter papers, and surging publication numbers. Citation-based metrics, such citation number and h-index, are likewise affected by the flood of papers, self-citations, and lengthy reference lists. Measures such as a journal’s impact factor have also ceased to be good metrics due to the soaring numbers of papers that are published in top journals, particularly from the same pool of authors. Moreover, by analyzing properties of >2,600 research fields, we observed that citation-based metrics are not beneficial for comparing researchers in different fields, or even in the same department.
Conclusions
Academic publishing has changed considerably; now we need to reconsider how we measure success.
Elsevier Information Processing & Management, 2020
Trends change rapidly in today's world and are readily observed in lists of most important people... more Trends change rapidly in today's world and are readily observed in lists of most important people, rankings of global companies, infectious disease patterns, political opinions, and popularities of online social networks. A key question arises: What is the mechanism behind the emergence of new trends? To answer this question, we can model real-world dynamic systems as networks, where a network is represented by a set of vertices and their corresponding links. The features and topology of these networks can then be analyzed, including how they evolve over a long period of time. However, the actual mechanisms behind these dynamic systems remain difficult to understand. Here we show the construction of the largest publicly available network evolution dataset to date, which we utilized to reveal how key entities in a network gain power. We employed state-of-the art data science tools and extensive cloud computing resources to create this massive corpora that contains 38,000 real-world networks and 2.5 million graphs. Then, we performed the first precise wide-scale analysis of the evolution of networks with various scales. Three primary observations emerged: first, links are most prevalent among vertices that join a network at a similar time; second, the rate that new vertices join a network is a central factor in molding a network's topology; and third, the emergence of network stars (high-degree vertices) is correlated with fast-growing networks. We applied our learnings to develop a simple network-generation model-a flexible model based on large-scale, real-world data. Our results are applicable to dynamic systems in nature and society, and deliver a better understanding of how stars within these networks rise and fall.
Springer Journal of Social Network Analysis and Mining (SNAM), 2018
In the past decade, graph-based structures have penetrated nearly every aspect of our lives. The ... more In the past decade, graph-based structures have penetrated nearly every aspect of our lives. The detection of anomalies in these networks has become increasingly important, such as in exposing infected endpoints in computer networks or identifying socialbots. In this study, we present a novel unsupervised two-layered meta-classifier that can detect irregular vertices in complex networks solely by utilizing topology-based features. Following the reasoning that a vertex with many improbable links has a higher likelihood of being anomalous, we applied our method on 10 networks of various scales, from a network of several dozen students to online networks with millions of vertices. In every scenario, we succeeded in identifying anomalous vertices with lower false positive rates and higher AUCs compared to other prevalent methods. Moreover, we demonstrated that the presented algorithm is generic, and efficient both in revealing fake users and in disclosing the influential people in social networks.
Online Social Networks (OSNs), such as Facebook and Twitter, have become an integral part of our ... more Online Social Networks (OSNs), such as Facebook and Twitter, have become an integral part of our daily lives. There are hundreds of OSNs, each with its own focus in that each oers particular services and functionalities. Recent studies show that many OSN users create several accounts on multiple OSNs using the same or dierent personal information. Collecting all the available data of an individual from several OSNs and fusing it into a single prole can be useful for many purposes. In this paper, we introduce novel machine learning based methods for solving Entity Resolution (ER), a problem for matching user proles across multiple OSNs. The presented methods are able to match between two user proles from two dierent OSNs based on supervised learning techniques, which use features extracted from each one of the user proles. By using the extracted features and supervised learning techniques, we developed classiers which can perform entity matching between two proles for the following scenarios: (a) matching entities across two OSNs; (b) searching for a user by similar name; and (c) de-anonymizing a user's identity. The constructed classiers were tested by using data collected from two popular OSNs, Facebook and Xing. We then evaluated the clas-siers' performances using various evaluation measures, such as true and false positive rates, accuracy, and the Area Under the receiver operator Curve (AUC). The constructed classiers were evaluated and their classication performance measured by AUC was quite remarkable , with an AUC of up to 0.982 and an accuracy of up to 95.9% in identifying user proles across two OSNs.
IEEE Communications Surveys & Tutorials
Many online social network (OSN) users are unaware of the numerous security risks that exist in t... more Many online social network (OSN) users are unaware of the numerous security risks that exist in these networks, including privacy violations, identity theft, and sexual harassment, just to name a few. According to recent studies, OSN users readily expose personal and private details about themselves, such as relationship status, date of birth, school name, email address, phone number, and even home address. This information, if put into the wrong hands, can be used to harm users both in the virtual world and in the real world. These risks become even more severe when the users are children.
In this paper we present a thorough review of the different security and privacy risks which threaten the well-being of OSN users in general, and children in particular. In addition, we present an overview of existing solutions that can provide better protection, security, and privacy for OSN users.
We also offer simple-to-implement recommendations for OSN users which can improve their security and privacy when using these platforms. Furthermore, we suggest future research directions.
Online genealogy datasets contain extensive information about millions of people and their past a... more Online genealogy datasets contain extensive information about millions of people and their past and present family connections. This vast amount of data can assist in identifying various patterns in human population.
In this study, we present methods and algorithms which can assist in identifying variations in lifespan distributions of human population in the past centuries, in detecting social and genetic features which correlate with human lifespan, and in constructing predictive models of human lifespan based on various features which can easily be extracted from genealogy datasets.
We have evaluated the presented methods and algorithms on a large online genealogy dataset with over a million profiles and over 8.8 million connections, all of which were collected from the WikiTree website.
Our findings indicate that significant but small positive correlations exist between the parents' lifespan and their children's lifespan. Additionally, we found slightly higher and significant correlations between the lifespans of spouses. We also discovered a very small positive and significant correlation between longevity and reproductive success in males, and a small and significant negative correlation between longevity and reproductive success in females. Moreover, our machine learning algorithms presented better than random classification results in predicting which people who outlive the age of 50 will also outlive the age of 80.
We believe that this study will be the first of many studies which utilize the wealth of data on human populations, existing in online genealogy datasets, to better understand factors which influence human lifespan. Understanding these factors can assist scientists in providing solutions for successful aging.
Springer Networks and Spatial Economics (NETS)
Complementing the formal organizational structure of a business are the informal connections amon... more Complementing the formal organizational structure of a business are the informal connections among employees. These relationships help identify knowledge hubs, working groups, and shortcuts through the organizational structure. They carry valuable information on how a company functions de facto. In the past, eliciting the informal social networks within an organization was challenging; today they are reflected by friendship relationships in online social networks. In this paper we analyze several commercial organizations by mining data which their employees have exposed on Facebook, LinkedIn, and other publicly available sources. Using a web crawler designed for this purpose, we extract a network of informal social relationships among employees of targeted organizations. Our results show that it is possible to identify leadership roles within the organization solely by using centrality analysis and machine learning techniques applied to the informal relationship network structure. Valuable non-trivial insights can also be gained by clustering an organization’s social network and gathering publicly available information on the employees within each cluster. Knowledge of the network of informal relationships may be a major asset or might be a significant threat to the underlying organization.
ASE Human Journal, 2012
"Today’s social networks are plagued by numerous types of malicious profiles, ranging from bots ... more "Today’s social networks are plagued by numerous types
of malicious profiles, ranging from bots to sexual predators. We present a novel method for the detection of these
malicious profiles by only using the social network’s own
topological features. The reliance on only these features
ensures that the proposed method is generic enough to
be applied on many types of social networks. The algorithm has been evaluated on several social networks and
was found to be effective in detecting several types of
malicious profiles. We believe this method is an important step towards making social networks less vulnerable to spammers, socialbots and sexual predators."
""Online social networking sites have become increasingly popular over the last few years. As a r... more ""Online social networking sites have become increasingly popular over the last few years. As a result, new interdisciplinary research directions have emerged in which social network analysis methods are applied to networks containing hundreds of millions of users. Unfortunately, links between individuals may be missing either due to an imperfect acquirement processes or because they are not yet reflected in the online network
(i.e., friends in real-world did not form a virtual connection.) The primary bottleneck in link prediction techniques is extracting the structural features required for classifying links. In this paper, we propose a set of simple, easy-to-compute structural features, that can be analyzed to identify missing links. We show that by using simple structural features, a machine learning classifier can successfully identify missing links, even when applied to a hard predicament of classifying links between individuals with at least one common friend. We also present a method for calculating the amount of data needed in order to build more accurate classifiers. The new Friends-measure and Same-community features we developed are shown to be good predictors for missing links. An evaluation experiment was performed on ten large Social Networks datasets: Academia.edu, DBLP, Facebook, Flickr, Flixster, Google+, Gowalla, TheMarker, Twitter, and YouTube. Our methods can provide social network site operators with the capability of helping users to find known, offline contacts and to discover new friends online. They may also be used for exposing hidden links in an online social networks.""
"The amount of personal information involuntarily exposed by users on online social networks is s... more "The amount of personal information involuntarily exposed by users on online social networks is staggering, as shown in recent research. Moreover, recent reports indicate that these networks are inundated with tens of millions of fake user profiles, which may jeopardize the user’s security and privacy. To identify fake users in such networks and to improve users’ security and privacy, we developed the Social Privacy Protector (SPP) software for Facebook. This software contains three protection layers that improve user privacy by implementing different methods to identify fake profiles. The first layer identifies a user’s friends who might pose a threat and then restricts the access these “friends” have to the user’s personal information. The second layer is an expansion of Facebook’s basic privacy settings based on different types of social network usage profiles. The third layer alerts users about the number of installed applications on their Facebook profile that has access to their private information. An initial version of the SPP software received positive media coverage, and more than 3,000 users from more than 20 countries have installed the software, out of which 527 have used the software to restrict more than 9,000 friends. In addition, we estimate that more than 100 users have accepted the software’s recommendations and removed nearly 1,800 Facebook applications from their profiles. By analyzing the unique dataset obtained by the software in combination with machine learning techniques, we developed classifiers that are able to predict Facebook profiles with a high probability of being fake and consequently threaten the user’s security and privacy. Moreover, in this study, we present statistics generated by the SPP software on both user privacy settings and the number of applications installed on Facebook profiles. These statistics alarmingly demonstrate how vulnerable Facebook users’ information is to both fake profile attacks and third-party Facebook applications."
This is a draft version of the article. The full version of the article was published in Social Network Analysis and Mining Journal, and can be downloaded from the following link:
http://link.springer.com/article/10.1007%2Fs13278-014-0194-4
This paper develops a methodology to aggregate signals in a network regarding some hidden state o... more This paper develops a methodology to aggregate signals in a network regarding some hidden state of the world. We argue that focusing on edges around hubs will under certain circumstances amplify the faint signals disseminating in a network, allowing for more efficient detection of that hidden state. We apply this method to detecting emergencies in mobile phone data, demonstrating that under a broad range of cases and a constraint in how many edges can be observed at a time, focusing on the egocentric networks around key hubs will be more effective than sampling random edges. We support this conclusion analytically, through simulations, and with analysis of a dataset containing the call log data from a major mobile carrier in a European nation.
Springer Science and Engineering Ethics Journal (accepted)
Online Social Networks (OSNs) have rapidly become a prominent and widely used service, offering a... more Online Social Networks (OSNs) have rapidly become a prominent and widely used service, offering a wealth of personal and sensitive information with significant security and privacy implications. Hence, OSNs are also an important - and popular - subject for research. To perform research based on real-life evidence, however, researchers may need to access OSN data, such as texts and files uploaded by users and connections among users. This raises significant ethical problems. Currently, there are no clear ethical guidelines, and researchers may end up (unintentionally) performing ethically questionable research, sometimes even when more ethical research alternatives exist. For example, several studies have employed `fake identities` to collect data from OSNs, but fake identities may be used for attacks and are considered a security issue. Is it legitimate to use fake identities for studying OSNs or for collecting OSN data for research? We present a taxonomy of the ethical challenges facing researchers of OSNs and compare different approaches. We demonstrate how ethical considerations have been taken into account in previous studies that used fake identities. In addition, several possible approaches are offered to reduce or avoid ethical misconducts. We hope this work will stimulate the development and use of ethical practices and methods in the research of online social networks.
A dimension of the Internet that has gained great popularity in recent years is the platform of o... more A dimension of the Internet that has gained great popularity in recent years is the platform of online social networks (OSNs). Users all over the world write, share, and publish personal information about themselves, their friends, and their workplaces within this platform of communication. In this study we demonstrate the relative ease of creating malicious socialbots that act as social network ”friends,” resulting in OSN users unknowingly exposing potentially harmful information about themselves and their places of employment. We present an algorithm for infiltrating specific OSN users who are employees of targeted organizations, using the topologies of organizational social networks and utilizing socialbots to gain access to these networks. We focus on two well-known OSNs - Facebook and Xing - to evaluate our suggested method for infiltrating key-role employees in targeted organizations. The results obtained demonstrate how adversaries can infiltrate social networks to gain access to valuable, private information
regarding employees and their organizations.
Journal Articles under Review by Michael Fire
We examine the impact that homophily can have on the diffusion of a phenomenon. We identify three... more We examine the impact that homophily can have on the diffusion of a phenomenon. We identify three mechanisms from the literature by which homophily can have an effect and model how they can change diffusion that is happening through social influence. By modelling and simulation we vary the size and composition of the initial seed of adopters who start the diffusion process -- the 'critical mass' -- and test this on simulated and real data. We then use real data on personal characteristics to model genuine -- rather than simulated -- homophily. Our main contribution lies in examining the impact that the composition of the critical mass has. When the critical mass group is small, a homophilious group will cause a phenomenon to spread further than a heterophilious group. As the critical mass group grows in size a chaotic period is entered where small variations in the composition will have a huge effect on whether a group of all one type or another will cause more diffusion. As the group size continues to grow a new pattern emerges where a heterophilious group will cause more diffusion than a homophilious group. These results are discussed and avenues for future research are identified.
Refereed Conference Proceedings by Michael Fire
2020 IEEE International Smart Cities Conference (ISC2), 2020
In city planning and maintenance, the ability to quickly identify infrastructure violations - suc... more In city planning and maintenance, the ability to quickly identify infrastructure violations - such as missing or misplaced fire hydrants - can be crucial for maintaining a safe city; it can even save lives. In this work, we aim to provide an analysis of such violations, and to demonstrate the potential of data-driven approaches for quickly locating and addressing them. We conduct an analytical study based upon data from the city of Beer-Sheva’s public records of fire hydrants, bomb shelters, and other public facilities. The results of our analysis are presented along with an interactive exploration tool, which allows for easy exploration and identification of the different facilities around the city that violate regulations.
International Conference on Intelligent Data Engineering and Automated Learning (IDEAL), 2020
With the increase in population densities and environmental awareness, public transport has becom... more With the increase in population densities and environmental awareness, public transport has become an important aspect of urban life. Consequently, large quantities of transportation data are generated, and mining data from smart card use has become a standardized method to understand the travel habits of passengers.
Increase in available data and computation power demands more sophisticated methods to analyze big data. Public transport datasets, however, often lack data integrity. Boarding stop information may be missing either due to imperfect acquirement processes or inadequate reporting. As a result, large quantities of observations and even complete sections of cities might be absent from the smart card database. We have developed a machine (supervised) learning method to impute missing boarding stops based on ordinal classification. In addition, we present a new metric, Pareto Accuracy, to evaluate algorithms where classes have an ordinal nature. Results are based on a case study in the city of Beer Sheva utilizing one month of data. We show that our proposed method significantly outperforms schedule-based imputation methods and can improve the accuracy and usefulness of large-scale transportation data. The implications for data imputation of smart card information is further discussed.
The 3rd IEEE International Conference on Smart Data (SmartData), 2017
Online advertising is a huge, rapidly growing advertising market in today's world. One common for... more Online advertising is a huge, rapidly growing advertising market in today's world. One common form of online advertising is using image ads. A decision is made (often in real time) every time a user sees an ad, and the advertiser is eager to determine the best ad to display. Consequently, many algorithms have been developed that calculate the optimal ad to show to the current user at the present time. Typically, these algorithms focus on variations of the ad, optimizing among different properties such as background color, image size, or set of images.
However, there is a more fundamental layer. Our study looks at new qualities of ads that can be determined before an ad is shown (rather than online optimization) and defines which ads are most likely to be successful.
We present a set of novel algorithms that utilize deep-learning image processing, machine learning, and graph theory to investigate online advertising and to construct prediction models which can foresee an image ad's success.
We evaluated our algorithms on a dataset with over 260,000 ad images, as well as a smaller dataset specifically related to the automotive industry, and we succeeded in constructing regression models for ad image click rate prediction.
The obtained results emphasize the great potential of using deep-learning algorithms to effectively and efficiently analyze image ads and to create better and more innovative online ads. Moreover, the algorithms presented in this paper can help predict ad success and can be applied to analyze other large-scale image corpora.
SocialCom 2013
In recent years, Online Social Networks (OSNs) have essentially become an integral part of ... more In recent years, Online Social Networks (OSNs)
have essentially become an integral part of our daily lives. There
are hundreds of OSNs, each with its own focus and offers for
particular services and functionalities. To take advantage of the
full range of services and functionalities that OSNs offer, users
often create several accounts on various OSNs using the same or
different personal information. Retrieving all available data
about an individual from several OSNs and merging it into one
profile can be useful for many purposes. In this paper, we present
a method for solving the Entity Resolution (ER), problem for
matching user profiles across multiple OSNs. Our algorithm is
able to match two user profiles from two different OSNs based on
machine learning techniques, which uses features extracted from
each one of the user profiles. Using supervised learning
techniques and extracted features, we constructed different
classifiers, which were then trained and used to rank the
probability that two user profiles from two different OSNs
belong to the same individual. These classifiers utilized 27
features of mainly three types: name based features (i.e., the
Soundex value of two names), general user info based features
(i.e., the cosine similarity between two user profiles), and social
network topological based features (i.e., the number of mutual
friends between two users’ friends list). This experimental study
uses real-life data collected from two popular OSNs, Facebook
and Xing. The proposed algorithm was evaluated and its
classification performance measured by AUC was 0.982 in
identifying user profiles across two OSNs.
Nature Humanities and Social Sciences Communications, 2020
Data science can offer answers to a wide range of social science questions. Here we turn attentio... more Data science can offer answers to a wide range of social science questions. Here we turn attention to the portrayal of women in movies, an industry that has a significant influence on society, impacting such aspects of life as self-esteem and career choice. To this end, we fused data from the online movie database IMDb with a dataset of movie dialogue subtitles to create the largest available corpus of movie social networks (15,540 networks). Analyzing this data, we investigated gender bias in on-screen female characters over the past century. We find a trend of improvement in all aspects of women‘s roles in movies, including a constant rise in the centrality of female characters. There has also been an increase in the number of movies that pass the well-known Bechdel test, a popular—albeit flawed—measure of women in fiction. Here we propose a new and better alternative to this test for evaluating female roles in movies. Our study introduces fresh data, an open-code framework, and novel techniques that present new opportunities in the research and analysis of movies
GigaScience, 2020
Background COVID-19 is the most rapidly expanding coronavirus outbreak in the past 2 decades. To ... more Background
COVID-19 is the most rapidly expanding coronavirus outbreak in the past 2 decades. To provide a swift response to a novel outbreak, prior knowledge from similar outbreaks is essential.
Results
Here, we study the volume of research conducted on previous coronavirus outbreaks, specifically SARS and MERS, relative to other infectious diseases by analyzing >35 million articles from the past 20 years. Our results demonstrate that previous coronavirus outbreaks have been understudied compared with other viruses. We also show that the research volume of emerging infectious diseases is very high after an outbreak and decreases drastically upon the containment of the disease. This can yield inadequate research and limited investment in gaining a full understanding of novel coronavirus management and prevention.
Conclusions
Independent of the outcome of the current COVID-19 outbreak, we believe that measures should be taken to encourage sustained research in the field.
GigaScience, 2019
Abstract Background The academic publishing world is changing significantly, with ever-growing nu... more Abstract
Background
The academic publishing world is changing significantly, with ever-growing numbers of publications each year and shifting publishing patterns. However, the metrics used to measure academic success, such as the number of publications, citation number, and impact factor, have not changed for decades. Moreover, recent studies indicate that these metrics have become targets and follow Goodhart’s Law, according to which, “when a measure becomes a target, it ceases to be a good measure.”
Results
In this study, we analyzed >120 million papers to examine how the academic publishing world has evolved over the last century, with a deeper look into the specific field of biology. Our study shows that the validity of citation-based measures is being compromised and their usefulness is lessening. In particular, the number of publications has ceased to be a good metric as a result of longer author lists, shorter papers, and surging publication numbers. Citation-based metrics, such citation number and h-index, are likewise affected by the flood of papers, self-citations, and lengthy reference lists. Measures such as a journal’s impact factor have also ceased to be good metrics due to the soaring numbers of papers that are published in top journals, particularly from the same pool of authors. Moreover, by analyzing properties of >2,600 research fields, we observed that citation-based metrics are not beneficial for comparing researchers in different fields, or even in the same department.
Conclusions
Academic publishing has changed considerably; now we need to reconsider how we measure success.
Elsevier Information Processing & Management, 2020
Trends change rapidly in today's world and are readily observed in lists of most important people... more Trends change rapidly in today's world and are readily observed in lists of most important people, rankings of global companies, infectious disease patterns, political opinions, and popularities of online social networks. A key question arises: What is the mechanism behind the emergence of new trends? To answer this question, we can model real-world dynamic systems as networks, where a network is represented by a set of vertices and their corresponding links. The features and topology of these networks can then be analyzed, including how they evolve over a long period of time. However, the actual mechanisms behind these dynamic systems remain difficult to understand. Here we show the construction of the largest publicly available network evolution dataset to date, which we utilized to reveal how key entities in a network gain power. We employed state-of-the art data science tools and extensive cloud computing resources to create this massive corpora that contains 38,000 real-world networks and 2.5 million graphs. Then, we performed the first precise wide-scale analysis of the evolution of networks with various scales. Three primary observations emerged: first, links are most prevalent among vertices that join a network at a similar time; second, the rate that new vertices join a network is a central factor in molding a network's topology; and third, the emergence of network stars (high-degree vertices) is correlated with fast-growing networks. We applied our learnings to develop a simple network-generation model-a flexible model based on large-scale, real-world data. Our results are applicable to dynamic systems in nature and society, and deliver a better understanding of how stars within these networks rise and fall.
Springer Journal of Social Network Analysis and Mining (SNAM), 2018
In the past decade, graph-based structures have penetrated nearly every aspect of our lives. The ... more In the past decade, graph-based structures have penetrated nearly every aspect of our lives. The detection of anomalies in these networks has become increasingly important, such as in exposing infected endpoints in computer networks or identifying socialbots. In this study, we present a novel unsupervised two-layered meta-classifier that can detect irregular vertices in complex networks solely by utilizing topology-based features. Following the reasoning that a vertex with many improbable links has a higher likelihood of being anomalous, we applied our method on 10 networks of various scales, from a network of several dozen students to online networks with millions of vertices. In every scenario, we succeeded in identifying anomalous vertices with lower false positive rates and higher AUCs compared to other prevalent methods. Moreover, we demonstrated that the presented algorithm is generic, and efficient both in revealing fake users and in disclosing the influential people in social networks.
Online Social Networks (OSNs), such as Facebook and Twitter, have become an integral part of our ... more Online Social Networks (OSNs), such as Facebook and Twitter, have become an integral part of our daily lives. There are hundreds of OSNs, each with its own focus in that each oers particular services and functionalities. Recent studies show that many OSN users create several accounts on multiple OSNs using the same or dierent personal information. Collecting all the available data of an individual from several OSNs and fusing it into a single prole can be useful for many purposes. In this paper, we introduce novel machine learning based methods for solving Entity Resolution (ER), a problem for matching user proles across multiple OSNs. The presented methods are able to match between two user proles from two dierent OSNs based on supervised learning techniques, which use features extracted from each one of the user proles. By using the extracted features and supervised learning techniques, we developed classiers which can perform entity matching between two proles for the following scenarios: (a) matching entities across two OSNs; (b) searching for a user by similar name; and (c) de-anonymizing a user's identity. The constructed classiers were tested by using data collected from two popular OSNs, Facebook and Xing. We then evaluated the clas-siers' performances using various evaluation measures, such as true and false positive rates, accuracy, and the Area Under the receiver operator Curve (AUC). The constructed classiers were evaluated and their classication performance measured by AUC was quite remarkable , with an AUC of up to 0.982 and an accuracy of up to 95.9% in identifying user proles across two OSNs.
IEEE Communications Surveys & Tutorials
Many online social network (OSN) users are unaware of the numerous security risks that exist in t... more Many online social network (OSN) users are unaware of the numerous security risks that exist in these networks, including privacy violations, identity theft, and sexual harassment, just to name a few. According to recent studies, OSN users readily expose personal and private details about themselves, such as relationship status, date of birth, school name, email address, phone number, and even home address. This information, if put into the wrong hands, can be used to harm users both in the virtual world and in the real world. These risks become even more severe when the users are children.
In this paper we present a thorough review of the different security and privacy risks which threaten the well-being of OSN users in general, and children in particular. In addition, we present an overview of existing solutions that can provide better protection, security, and privacy for OSN users.
We also offer simple-to-implement recommendations for OSN users which can improve their security and privacy when using these platforms. Furthermore, we suggest future research directions.
Online genealogy datasets contain extensive information about millions of people and their past a... more Online genealogy datasets contain extensive information about millions of people and their past and present family connections. This vast amount of data can assist in identifying various patterns in human population.
In this study, we present methods and algorithms which can assist in identifying variations in lifespan distributions of human population in the past centuries, in detecting social and genetic features which correlate with human lifespan, and in constructing predictive models of human lifespan based on various features which can easily be extracted from genealogy datasets.
We have evaluated the presented methods and algorithms on a large online genealogy dataset with over a million profiles and over 8.8 million connections, all of which were collected from the WikiTree website.
Our findings indicate that significant but small positive correlations exist between the parents' lifespan and their children's lifespan. Additionally, we found slightly higher and significant correlations between the lifespans of spouses. We also discovered a very small positive and significant correlation between longevity and reproductive success in males, and a small and significant negative correlation between longevity and reproductive success in females. Moreover, our machine learning algorithms presented better than random classification results in predicting which people who outlive the age of 50 will also outlive the age of 80.
We believe that this study will be the first of many studies which utilize the wealth of data on human populations, existing in online genealogy datasets, to better understand factors which influence human lifespan. Understanding these factors can assist scientists in providing solutions for successful aging.
Springer Networks and Spatial Economics (NETS)
Complementing the formal organizational structure of a business are the informal connections amon... more Complementing the formal organizational structure of a business are the informal connections among employees. These relationships help identify knowledge hubs, working groups, and shortcuts through the organizational structure. They carry valuable information on how a company functions de facto. In the past, eliciting the informal social networks within an organization was challenging; today they are reflected by friendship relationships in online social networks. In this paper we analyze several commercial organizations by mining data which their employees have exposed on Facebook, LinkedIn, and other publicly available sources. Using a web crawler designed for this purpose, we extract a network of informal social relationships among employees of targeted organizations. Our results show that it is possible to identify leadership roles within the organization solely by using centrality analysis and machine learning techniques applied to the informal relationship network structure. Valuable non-trivial insights can also be gained by clustering an organization’s social network and gathering publicly available information on the employees within each cluster. Knowledge of the network of informal relationships may be a major asset or might be a significant threat to the underlying organization.
ASE Human Journal, 2012
"Today’s social networks are plagued by numerous types of malicious profiles, ranging from bots ... more "Today’s social networks are plagued by numerous types
of malicious profiles, ranging from bots to sexual predators. We present a novel method for the detection of these
malicious profiles by only using the social network’s own
topological features. The reliance on only these features
ensures that the proposed method is generic enough to
be applied on many types of social networks. The algorithm has been evaluated on several social networks and
was found to be effective in detecting several types of
malicious profiles. We believe this method is an important step towards making social networks less vulnerable to spammers, socialbots and sexual predators."
""Online social networking sites have become increasingly popular over the last few years. As a r... more ""Online social networking sites have become increasingly popular over the last few years. As a result, new interdisciplinary research directions have emerged in which social network analysis methods are applied to networks containing hundreds of millions of users. Unfortunately, links between individuals may be missing either due to an imperfect acquirement processes or because they are not yet reflected in the online network
(i.e., friends in real-world did not form a virtual connection.) The primary bottleneck in link prediction techniques is extracting the structural features required for classifying links. In this paper, we propose a set of simple, easy-to-compute structural features, that can be analyzed to identify missing links. We show that by using simple structural features, a machine learning classifier can successfully identify missing links, even when applied to a hard predicament of classifying links between individuals with at least one common friend. We also present a method for calculating the amount of data needed in order to build more accurate classifiers. The new Friends-measure and Same-community features we developed are shown to be good predictors for missing links. An evaluation experiment was performed on ten large Social Networks datasets: Academia.edu, DBLP, Facebook, Flickr, Flixster, Google+, Gowalla, TheMarker, Twitter, and YouTube. Our methods can provide social network site operators with the capability of helping users to find known, offline contacts and to discover new friends online. They may also be used for exposing hidden links in an online social networks.""
"The amount of personal information involuntarily exposed by users on online social networks is s... more "The amount of personal information involuntarily exposed by users on online social networks is staggering, as shown in recent research. Moreover, recent reports indicate that these networks are inundated with tens of millions of fake user profiles, which may jeopardize the user’s security and privacy. To identify fake users in such networks and to improve users’ security and privacy, we developed the Social Privacy Protector (SPP) software for Facebook. This software contains three protection layers that improve user privacy by implementing different methods to identify fake profiles. The first layer identifies a user’s friends who might pose a threat and then restricts the access these “friends” have to the user’s personal information. The second layer is an expansion of Facebook’s basic privacy settings based on different types of social network usage profiles. The third layer alerts users about the number of installed applications on their Facebook profile that has access to their private information. An initial version of the SPP software received positive media coverage, and more than 3,000 users from more than 20 countries have installed the software, out of which 527 have used the software to restrict more than 9,000 friends. In addition, we estimate that more than 100 users have accepted the software’s recommendations and removed nearly 1,800 Facebook applications from their profiles. By analyzing the unique dataset obtained by the software in combination with machine learning techniques, we developed classifiers that are able to predict Facebook profiles with a high probability of being fake and consequently threaten the user’s security and privacy. Moreover, in this study, we present statistics generated by the SPP software on both user privacy settings and the number of applications installed on Facebook profiles. These statistics alarmingly demonstrate how vulnerable Facebook users’ information is to both fake profile attacks and third-party Facebook applications."
This is a draft version of the article. The full version of the article was published in Social Network Analysis and Mining Journal, and can be downloaded from the following link:
http://link.springer.com/article/10.1007%2Fs13278-014-0194-4
This paper develops a methodology to aggregate signals in a network regarding some hidden state o... more This paper develops a methodology to aggregate signals in a network regarding some hidden state of the world. We argue that focusing on edges around hubs will under certain circumstances amplify the faint signals disseminating in a network, allowing for more efficient detection of that hidden state. We apply this method to detecting emergencies in mobile phone data, demonstrating that under a broad range of cases and a constraint in how many edges can be observed at a time, focusing on the egocentric networks around key hubs will be more effective than sampling random edges. We support this conclusion analytically, through simulations, and with analysis of a dataset containing the call log data from a major mobile carrier in a European nation.
Springer Science and Engineering Ethics Journal (accepted)
Online Social Networks (OSNs) have rapidly become a prominent and widely used service, offering a... more Online Social Networks (OSNs) have rapidly become a prominent and widely used service, offering a wealth of personal and sensitive information with significant security and privacy implications. Hence, OSNs are also an important - and popular - subject for research. To perform research based on real-life evidence, however, researchers may need to access OSN data, such as texts and files uploaded by users and connections among users. This raises significant ethical problems. Currently, there are no clear ethical guidelines, and researchers may end up (unintentionally) performing ethically questionable research, sometimes even when more ethical research alternatives exist. For example, several studies have employed `fake identities` to collect data from OSNs, but fake identities may be used for attacks and are considered a security issue. Is it legitimate to use fake identities for studying OSNs or for collecting OSN data for research? We present a taxonomy of the ethical challenges facing researchers of OSNs and compare different approaches. We demonstrate how ethical considerations have been taken into account in previous studies that used fake identities. In addition, several possible approaches are offered to reduce or avoid ethical misconducts. We hope this work will stimulate the development and use of ethical practices and methods in the research of online social networks.
A dimension of the Internet that has gained great popularity in recent years is the platform of o... more A dimension of the Internet that has gained great popularity in recent years is the platform of online social networks (OSNs). Users all over the world write, share, and publish personal information about themselves, their friends, and their workplaces within this platform of communication. In this study we demonstrate the relative ease of creating malicious socialbots that act as social network ”friends,” resulting in OSN users unknowingly exposing potentially harmful information about themselves and their places of employment. We present an algorithm for infiltrating specific OSN users who are employees of targeted organizations, using the topologies of organizational social networks and utilizing socialbots to gain access to these networks. We focus on two well-known OSNs - Facebook and Xing - to evaluate our suggested method for infiltrating key-role employees in targeted organizations. The results obtained demonstrate how adversaries can infiltrate social networks to gain access to valuable, private information
regarding employees and their organizations.
We examine the impact that homophily can have on the diffusion of a phenomenon. We identify three... more We examine the impact that homophily can have on the diffusion of a phenomenon. We identify three mechanisms from the literature by which homophily can have an effect and model how they can change diffusion that is happening through social influence. By modelling and simulation we vary the size and composition of the initial seed of adopters who start the diffusion process -- the 'critical mass' -- and test this on simulated and real data. We then use real data on personal characteristics to model genuine -- rather than simulated -- homophily. Our main contribution lies in examining the impact that the composition of the critical mass has. When the critical mass group is small, a homophilious group will cause a phenomenon to spread further than a heterophilious group. As the critical mass group grows in size a chaotic period is entered where small variations in the composition will have a huge effect on whether a group of all one type or another will cause more diffusion. As the group size continues to grow a new pattern emerges where a heterophilious group will cause more diffusion than a homophilious group. These results are discussed and avenues for future research are identified.
2020 IEEE International Smart Cities Conference (ISC2), 2020
In city planning and maintenance, the ability to quickly identify infrastructure violations - suc... more In city planning and maintenance, the ability to quickly identify infrastructure violations - such as missing or misplaced fire hydrants - can be crucial for maintaining a safe city; it can even save lives. In this work, we aim to provide an analysis of such violations, and to demonstrate the potential of data-driven approaches for quickly locating and addressing them. We conduct an analytical study based upon data from the city of Beer-Sheva’s public records of fire hydrants, bomb shelters, and other public facilities. The results of our analysis are presented along with an interactive exploration tool, which allows for easy exploration and identification of the different facilities around the city that violate regulations.
International Conference on Intelligent Data Engineering and Automated Learning (IDEAL), 2020
With the increase in population densities and environmental awareness, public transport has becom... more With the increase in population densities and environmental awareness, public transport has become an important aspect of urban life. Consequently, large quantities of transportation data are generated, and mining data from smart card use has become a standardized method to understand the travel habits of passengers.
Increase in available data and computation power demands more sophisticated methods to analyze big data. Public transport datasets, however, often lack data integrity. Boarding stop information may be missing either due to imperfect acquirement processes or inadequate reporting. As a result, large quantities of observations and even complete sections of cities might be absent from the smart card database. We have developed a machine (supervised) learning method to impute missing boarding stops based on ordinal classification. In addition, we present a new metric, Pareto Accuracy, to evaluate algorithms where classes have an ordinal nature. Results are based on a case study in the city of Beer Sheva utilizing one month of data. We show that our proposed method significantly outperforms schedule-based imputation methods and can improve the accuracy and usefulness of large-scale transportation data. The implications for data imputation of smart card information is further discussed.
The 3rd IEEE International Conference on Smart Data (SmartData), 2017
Online advertising is a huge, rapidly growing advertising market in today's world. One common for... more Online advertising is a huge, rapidly growing advertising market in today's world. One common form of online advertising is using image ads. A decision is made (often in real time) every time a user sees an ad, and the advertiser is eager to determine the best ad to display. Consequently, many algorithms have been developed that calculate the optimal ad to show to the current user at the present time. Typically, these algorithms focus on variations of the ad, optimizing among different properties such as background color, image size, or set of images.
However, there is a more fundamental layer. Our study looks at new qualities of ads that can be determined before an ad is shown (rather than online optimization) and defines which ads are most likely to be successful.
We present a set of novel algorithms that utilize deep-learning image processing, machine learning, and graph theory to investigate online advertising and to construct prediction models which can foresee an image ad's success.
We evaluated our algorithms on a dataset with over 260,000 ad images, as well as a smaller dataset specifically related to the automotive industry, and we succeeded in constructing regression models for ad image click rate prediction.
The obtained results emphasize the great potential of using deep-learning algorithms to effectively and efficiently analyze image ads and to create better and more innovative online ads. Moreover, the algorithms presented in this paper can help predict ad success and can be applied to analyze other large-scale image corpora.
SocialCom 2013
In recent years, Online Social Networks (OSNs) have essentially become an integral part of ... more In recent years, Online Social Networks (OSNs)
have essentially become an integral part of our daily lives. There
are hundreds of OSNs, each with its own focus and offers for
particular services and functionalities. To take advantage of the
full range of services and functionalities that OSNs offer, users
often create several accounts on various OSNs using the same or
different personal information. Retrieving all available data
about an individual from several OSNs and merging it into one
profile can be useful for many purposes. In this paper, we present
a method for solving the Entity Resolution (ER), problem for
matching user profiles across multiple OSNs. Our algorithm is
able to match two user profiles from two different OSNs based on
machine learning techniques, which uses features extracted from
each one of the user profiles. Using supervised learning
techniques and extracted features, we constructed different
classifiers, which were then trained and used to rank the
probability that two user profiles from two different OSNs
belong to the same individual. These classifiers utilized 27
features of mainly three types: name based features (i.e., the
Soundex value of two names), general user info based features
(i.e., the cosine similarity between two user profiles), and social
network topological based features (i.e., the number of mutual
friends between two users’ friends list). This experimental study
uses real-life data collected from two popular OSNs, Facebook
and Xing. The proposed algorithm was evaluated and its
classification performance measured by AUC was 0.982 in
identifying user profiles across two OSNs.
Active Media Technology (AMT), Springer LNCS, pp. 584--595, Macau, 2012.
In this paper we propose a novel method for the prediction of a person's success in an academic c... more In this paper we propose a novel method for the prediction of a person's success in an academic course. By extracting log data from the course's website and using network analysis, we were able to model and visualize the social interactions among the students in a course. For our analysis we extracted a variety of features by using both graph theory and social networks analysis. Finally, we successfully used several regression and machine learning techniques in order to predict the success of student in a course. An interesting fact uncovered by this research is that the proposed model has a shown a high correlation between the grade of a student and that of his "best" friend.
"Traffic measurements, road safety studies, and surveys are required for efficient road planning ... more "Traffic measurements, road safety studies, and surveys are required for efficient road planning and ensuring the safety of transportation. Unfortunately, these methods can be cumbersome and very expensive. In this paper we point out a source of transportation information that is based on collaborative community-based navigation applications, such as Waze. Partial and anonimized information publicly exposed by Waze through their application provides valuable information that can significantly ease the future of transportation studies. Moreover, we show that Waze user reports may expose locations plagued with accidents but in lacking police coverage. This knowledge may help police departments to improve road safety by relocating the police units to these locations. Lastly,
the data discussed in this paper connects transportation and
road safety research to location based services and social network platforms."
The Third International Conference on Social Eco-Informatics (Sotics 2013), 2013
Facebook applications are one of the reasons for Facebook attractiveness. Unfortunately, numerous... more Facebook applications are one of the reasons for Facebook attractiveness. Unfortunately, numerous users are not aware of the fact that many malicious Facebook applications exist. To educate users, to raise users' awareness and to improve Facebook users' security and privacy, we developed a Firefox add-on that alerts users to the number of installed applications on their Facebook profiles. In this study, we present the temporal analysis of the Facebook applications' installation and removal dataset collected by our add-on. This dataset consists of information from 2,945 users, collected during a period of over a year. We used linear regression to analyze our dataset and discovered the linear connection between the average percentage change of newly installed Facebook applications and the number of days passed since the user initially installed our add-on. Additionally, we found out that users who used our Firefox add-on become more aware of their security and privacy installing on average fewer new applications. Finally, we discovered that on average 86.4% of Facebook users install an additional application every 4.2 days.
"In recent years, online social networks have grown exponentially, as these networks are fantasti... more "In recent years, online social networks have grown exponentially, as these networks are fantastic places to meet and network with people who share similar personal interests. Facebook, currently the largest social network, has more than 901 million active users. The amount of personal information each user exposes on social networks such as Facebook is staggering. Recent research in the area of social networking evaluated that many Facebook users exposed personal information. Due to the many security concerns regarding online personal exposure, we developed the Social Privacy Protector, a software which aims to improve the security and privacy of Facebook users. The software contains three protection layers which improve user privacy by implementing different methods. The software first identifies a user's friends who might pose a threat, and then restricts the ``friend"'s exposure to the user's personal information. The second layer is an expansion of Facebook's basic privacy settings based on different types of social network usage profiles. The third layer alerts the user about the number of installed applications on their Facebook profile which have access to their private information. An initial version of the Social Privacy Protection software was evaluated on 74 Facebook users and successfully assisted them in restricting the access of 392 friends.
"
The Third IEEE International Conference on Social Computing (SocialCom2011), MIT, Boston, USA
"Online social networking sites have become increas-ingly popular over the last few years. As a r... more "Online social networking sites have become increas-ingly popular over the last few years. As a result, new interdisci-plinary research directions have emerged in which social network analysis methods are applied to networks containing hundreds millions of users. Unfortunately, links between individuals may be missing either due to imperfect acquirement processes or because they are not yet reflected in the online network (i.e., friends in real-world did not form a virtual connection.) Existing link prediction techniques lack the scalability required for full application on a continuously growing social network. The primary bottleneck in link prediction techniques is ex-tracting structural features required for classifying links. In this paper we propose a set of simple, easy-to-compute structural features, that can be analyzed to identify missing links. We show that by using simple structural features, a machine learning classifier can successfully identify missing links, even when applied to a hard problem of classifying links between individuals
with at least one common friend. A new friends measure that
we developed is shown to be a good predictor for missing
links. An evaluation experiment was performed on five large
Social Networks datasets: Facebook, Flickr, YouTube, Academia and TheMarker. Our methods can provide social network site operators with the capability of helping users to find known, offline contacts and to discover new friends online. They may also be used for exposing hidden links in an online social network."
Social Computing, Behavioral Modeling and Prediction (SBP), 2012
"Abstract. As truly ubiquitous wearable computers, mobile phones are quickly becom-ing the primar... more "Abstract. As truly ubiquitous wearable computers, mobile phones are quickly becom-ing the primary source for social, behavioral and environmental sensing and data col-lection. Today’s smartphones are equipped with increasingly more sensors and accessi-ble data types that enable the collection of literally dozens of signals related to the
phone, its user, and its environment. A great deal of research effort in academia and
industry is put into mining this raw data for higher level sense-making, such as under-standing user context, inferring social networks, learning individual features, and so on.
In many cases, this analysis work is the result of exploratory forays and trial-and-error.
In this work we investigate the properties of learning and inferences of real world data
collected via mobile phones for different sizes of analyzed networks. In particular, we
examine how the ability to predict individual features and social links is incrementally
enhanced with the accumulation of additional data. To accomplish this, we use the
Friends and Family dataset, which contains rich data signals gathered from the
smartphones of 130 adult members of a young-family residential community over the
course of a year and consequently has become one of the most comprehensive mobile
phone datasets gathered in academia to date. Our results show that features such as
ethnicity, age and marital status can be detected by analyzing social and behavioral
signals. We then investigate how the prediction accuracy is increased when the users
sample set grows. Finally, we propose a method for advanced prediction of the maxi-mal learning accuracy possible for the learning task at hand, based on an initial set of
measurements. These predictions have practical implications, such as influencing the
design of mobile data collection campaigns or evaluating analysis strategies."
ASE International Conference on Cyber Security, Washington D.C. , Dec 2012
Recent years we have witnessed a significant growth in the usage of online social networks. Commo... more Recent years we have witnessed a significant growth in the usage of online social networks. Common networks like Facebook, Twitter, Pinterest, and LinkedIn have become popular all over the world. In these networks users write, share, and publish personal information about themselves, their friends, and their workplace. In this study we present a method for the mining of information of an organization through the use of social networks and socialbots. Our socialbots send friend requests to Facebook users who work in a targeted organization. By accepting friend requests by socialbots, users expose information about themselves and about their workplace. We tested the proposed method on two real organizations successfully infiltrated both. Compared to our previous study, our method was able to discover up to 13.55% more employees and up to 18.29% more informal organizational links. Our results demonstrate once again that organizations which are interested in protecting themselves should instruct their employees not to disclose information in social networks and to be cautious of accepting friendship requests from unknown persons.
the International Conference on Social Computing, Behavioral-Cultural Modeling, & Prediction (SBP 2013)
"In this paper we discuss the analysis of mobile networks communication patterns in the presence... more "In this paper we discuss the analysis of mobile networks communication
patterns in the presence of some anomalous “real world event”. We argue
that given limited analysis resources (namely, limited number of network edges
we can analyze), it is best to select edges that are located around ‘hubs’ in the
network. We demonstrate this method using a dataset containing the call log data
from a major mobile carrier in a European nation."
the 2013 International Conference on Social Computing, Behavioral-Cultural Modeling, & Prediction (SBP 2013)
Mobile phones are quickly becoming the primary source for social, behavioral, and environmental s... more Mobile phones are quickly becoming the primary source for social, behavioral, and environmental sensing and data collection. Today's smartphones are equipped with increasingly more sensors and accessible data types that enable the collection of literally dozens of signals related to the phone, its user, and its environment. A great deal of research effort in academia and industry is put into mining this raw data for higher level sense-making, such as understanding user context, inferring social networks, learning individual features, and behavior prediction. In this work we investigate the properties of learning and inferences of real world data collected via mobile phones. In particular, we look at the dynamic learning process over time with various sizes of sampling groups and examine the interplay between these two parameters. We validate our model using extensive simulations carried out using the "Friends and Family" dataset which contains rich data signals gathered from the smartphones of 140 adult members of a young-family residential community for over a year and is one of the most comprehensive mobile phone datasets gathered in academia to date
Formal Power Series and Algebraic Combinatorics, …, Jan 1, 2006
A classical result of MacMahon shows that the length function and the major index are equidistrib... more A classical result of MacMahon shows that the length function and the major index are equidistributed
over the symmetric groups. Through the years this result was generalized in various ways to
signed permutation groups. In this paper we present several new generalizations, in particular, we study the
effect of different linear orders on the letters [−n, n] and generalize a classical result of Foata and Zeilberger
Sixth Symposium on Human-Computer Interaction and Information Retrieval (HCIR) 2012, Boston, USA.
"Our system illustrates how information retrieved from social networks can be used for suggesting... more "Our system illustrates how information retrieved from social networks can be used for suggesting experts for specific tasks. The system is designed to facilitate the task of finding the appropriate person(s) for a job, as a conference committee member, an advisor, etc. This short description will demonstrate how the system works in the context of the HCIR2012 published tasks.
"
Third Workshop on Social Network Analysis in Applications (SNAA 2013)
One dimension on the Internet, which has gained great popularity in recent years are the online s... more One dimension on the Internet, which has gained great popularity in recent years are the online social networks (OSNs). Users all over the globe write, share, and publish personal information about themselves, their friends, and their workplace. In this study we present a method for infiltrating specific users in targeted organizations by using organizational social networks topologies and Socialbots. The targeted organizations, which have been chosen by us, were technology-oriented organizations. Employees from this kind of organization should be more aware of the dangers of exposing private information. An infiltration is defined as accepting a Socialbot's friend request. Upon accepting a Socialbot's friend request, users unknowingly expose information about themselves and their workplace. To infiltrate this we had to use our Socialbots in a sophisticated manner. First, we had to gather information and recognize Facebook users who work in targeted organizations. Afterwards, we chose ten Facebook users from every targeted organization randomly. These ten users were chosen to be the specific users from targeted organizations of which we would like to infiltrate. The Socialbots sent friend requests to all specific users' mutual friends who worked or work in the same targeted organization. The rationale behind this idea was to gain as many mutual friends as possible and through this act increase the probability that our friend requests will be accepted by the targeted users. We tested the proposed method on targeted users from two different organizations. Our method was able to gain a successful percentage of 50% and 70% respectively. The results demonstrate how easily adversaries can infiltrate users they do not know and get full access to personal and valuable information. These results are more surprising when we emphasize the fact that we chose oriented users who should be more aware to the dangers of information leakage for this study on purpose. Moreover, the results indicate once again that users who are interested in protecting themselves should not disclose information in OSNs and should be cautious of accepting friendship requests from unknown persons.
First International Workshop on Wide Spectrum Social Signal Processing (WS³P 2012), Sep 1, 2012
"As truly ubiquitous wearable computers, mobile phones are quickly becoming the primary source fo... more "As truly ubiquitous wearable computers, mobile phones are quickly becoming the primary source for social, behavioral, and environmental sensing and data collection. Today’s smartphones are equipped with increasingly more sensors and accessible data types that enable the collection of literally dozens of signals related to the phone, its user, and its environment. A great deal of research effort in academia and industry is put into mining this raw data for higher level sense-making, such as understanding user context, inferring social networks, learning individual features, predicting outcomes, and so on. In many cases, this analysis work is the result of exploratory forays and trial-and-error. Adding to the challenge, the devices themselves are a limited platform, and any data collection campaign must be carefully designed in order to collect the right signals, in the appropriate frequency, and at the same time not exhausting the device’s limited battery and processing power. There is need for a more structured methodology and tools to help with designing mobile data collection and analysis initiative.
In this work we investigate the properties of learning and inference of real world data collected via mobile phones over time. In particular, we look at the dynamic learning process over time, and how the ability to predict individual parameters and social links is incrementally enhanced with the accumulation of additional data. To do this, we use the Friends and Family dataset, which contains rich data signals gathered from the smartphones of 140 adult members of a young-family residential community for over a year, and is one of the most comprehensive mobile phone datasets gathered in academia to date.
We develop several models that predict social and individual properties from sensed mobile phone data, including detection of life-partners, ethnicity, and whether a person is a student or not. Then, for this set of diverse learning tasks, we investigate how the prediction accuracy evolves over time, as new data is collected. Finally, based on gained insights, we propose a method for advance prediction of the maximal learning accuracy possible for the learning task at hand, based on an initial set of measurements. This has practical implications, like informing the design of mobile data collection campaigns, or evaluating analysis strategies."
Security and Privacy in Social Networks, 2012, Springer
The explosion in the use of social networks has led to the creation of new kinds of security and ... more The explosion in the use of social networks has led to the creation of new kinds of security and privacy threats. Many users are unaware of the risks involved with exposing personal information, a fact that makes social networks a “bonanza” for spammers and identity thieves. In addition, it has already been proven that even the concealment of all personal data might not be enough to provide protection, as one’s personal information can be inferred by analyzing one’s connections to other users. In this paper we present the “link reconstruction attack”, a method capable of inferring one’s connections to others with high accuracy. We show that the concealment of ones links is ineffective if not done by others in the network and present an analysis of the performance of various machine learning algorithms for link prediction inside communities.
Handbook of Computational Approaches to Counterterrorism, Springer
"Extremist organizations all over the world increasingly use online social networks as a communic... more "Extremist organizations all over the world increasingly use online social networks as a communication media for recruitment and planning. As such, online social networks are also a source of information utilized by intelligence and counter
terror organizations investigating the relationships between suspected individuals. Unfortunately, the data mined from open sources is usually far from being complete due to the efforts of suspected and known terrorists to hide their relationships. One
of the methods used to uncover missing information in social networks is referred to as link prediction. We use link prediction methods solely based on network struc-ture analysis to infer hidden relationships among individuals and investigate their
effectiveness in fractional datasets. Experiments performed on a number of closed communities extracted from organizational and public social networks show that structural link prediction retains its effectiveness even when large parts of the origi-nal social network are hidden."
Arxiv preprint math/0409421, Jan 1, 2004
Driven by the popularity of television shows such as Who Do You Think You Are? many millions of u... more Driven by the popularity of television shows such as Who Do You Think You Are? many millions of users have uploaded their family tree to web projects such as WikiTree. Analysis of this corpus enables us to investigate genealogy computationally. The study of heritage in the social sciences has led to an increased understanding of ancestry and descent but such efforts are hampered by difficult to access data. Genealogical research is typically a tedious process involving trawling through sources such as birth and death certificates, wills, letters and land deeds. Decades of research have developed and examined hypotheses on population sex ratios, marriage trends, fertility, lifespan, and the frequency of twins and triplets. These can now be tested on vast datasets containing many billions of entries using machine learning tools. Here we survey the use of genealogy data mining using family trees dating back centuries and featuring profiles on nearly 7 million individuals based in over 160 countries. These data are not typically created by trained genealogists and so we verify them with reference to third party censuses. We present results on a range of aspects of population dynamics. Our approach extends the boundaries of genealogy inquiry to precise measurement of underlying human phenomena.
(Draft Version)
Trends change rapidly in today's world, prompting this key question: What is the mechanism behind... more Trends change rapidly in today's world, prompting this key question: What is the mechanism behind the emergence of new trends? By representing real-world dynamic systems as complex networks, the emergence of new trends can be symbolized by vertices that "shine." That is, at a specific time interval in a network's life, certain vertices become increasingly connected to other vertices. This process creates new high-degree vertices, i.e., network stars. Thus, to study trends, we must look at how networks evolve over time and determine how the stars behave. In our research, we constructed the largest publicly available network evolution dataset to date, which contains 38,000 real-world networks and 2.5 million graphs. Then, we performed the first precise wide-scale analysis of the evolution of networks with various scales. Three primary observations resulted: (a) links are most prevalent among vertices that join a network at a similar time; (b) the rate that new vertices join a network is a central factor in molding a network's topology; and (c) the emergence of network stars (high-degree vertices) is correlated with fast-growing networks. We applied our learnings to develop a flexible network-generation model based on large-scale, real-world data. This model gives a better understanding of how stars rise and fall within networks, and is applicable to dynamic systems both in nature and society.
Complex networks have non-trivial characteristics and appear in many real-world systems. Due to t... more Complex networks have non-trivial characteristics and appear in many real-world systems. Due to their vital importance in a large number of research fields, various studies have offered explanations on how complex networks evolve, but the full underlying dynamics of complex networks are not completely understood. Many of the barriers to better understanding the evolution process of these networks can be removed with the emergence of new data sources. This study utilizes the recently published Reddit dataset, containing over 1.65 billion comments, to construct the largest publicly available social network corpus, which contains detailed information on the evolution process of 11,965 social networks. We used this dataset to study the effect of the patterns in which new users join a network (referred to as user arrival curves, or UACs) on the network topology. Our results present evidence that UACs are a central factor in molding a network's topology; that is, different arrival patterns create different topological properties. Additionally, we show that it is possible to uncover the types of user arrival patterns by analyzing a social network's topology. These results imply that existing complex network evolution models need to be revis-ited and modified to include user arrival patterns as input to the models, in order to create models that more accurately reflect real-world complex networks.
Neural Processing Letters
Nowadays, detecting anomalous communities in networks is an essential task in research, as it hel... more Nowadays, detecting anomalous communities in networks is an essential task in research, as it helps discover insights into community-structured networks. Most of the existing methods leverage either information regarding attributes of vertices or the topological structure of communities. In this study, we introduce the Co-Membership-based Generic Anomalous Communities Detection Algorithm (referred as to CMMAC), a novel and generic method that utilizes the information of vertices co-membership in multiple communities. CMMAC is domain-free and almost unaffected by communities' sizes and densities. Specifically, we train a classifier to predict the probability of each vertex in a community being a member of the community. We then rank the communities by the aggregated membership probabilities of each community's vertices. The lowest-ranked communities are considered to be anomalous. Furthermore, we present an algorithm for generating a community-structured random network enabling the infusion of anomalous communities to facilitate research in the field. We utilized it to generate two datasets, composed of thousands of labeled anomaly-infused networks, and published them. We experimented extensively on thousands of simulated, and real-world networks, infused with artificial anomalies. CMMAC outperformed other existing methods in a range of settings. Additionally, we demonstrated that CMMAC can identify abnormal communities in real-world unlabeled networks in different domains, such as Reddit and Wikipedia.
Cornell University - arXiv, May 24, 2020
Searching for information about a specific person is an online activity frequently performed by m... more Searching for information about a specific person is an online activity frequently performed by many users. In most cases, users are aided by queries containing a name in Web search engines for finding their will. Typically, Web search engines provide just a few accurate results associated with a name-containing query. Most existing solutions for suggesting synonyms in online search are based on pattern matching and phonetic encoding, however very often, the performance of such solutions is less than optimal. In this paper, we propose SpokenName2Vec, a novel and generic algorithm which addresses the similar name suggestion problem by utilizing automated speech generation, and deep learning to produce spoken name embeddings. These sophisticated and innovative embeddings capture the way people pronounce names in any language and accent. Utilizing a name's pronunciation can be helpful for both differentiating and detecting names that sound alike, but are written differently. The proposed approach was demonstrated on a large-scale dataset consisting of 250,000 forenames and evaluated using a machine learning classifier and 7,399 names with their verified synonyms.The performance of the proposed approach was found to be superior to 10 other algorithms evaluated in this study, including well used phonetic encoding and string similarity algorithms, and two recently proposed algorithms (e.g., Name2Vec and GRAFT). The results obtained suggest that the proposed algorithm could serve as a useful and valuable tool for solving the problem of synonym suggestion.
Cornell University - arXiv, Sep 16, 2013
Facebook applications are one of the reasons for Facebook attractiveness. Unfortunately, numerous... more Facebook applications are one of the reasons for Facebook attractiveness. Unfortunately, numerous users are not aware of the fact that many malicious Facebook applications exist. To educate users, to raise users awareness and to improve Facebook users security and privacy, we developed a Firefox add-on that alerts users to the number of installed applications on their Facebook profiles. In this study, we present the temporal analysis of the Facebook applications installation and removal dataset collected by our add-on. This dataset consists of information from 2,945 users, collected during a period of over a year. We used linear regression to analyze our dataset and discovered the linear connection between the average percentage change of newly installed Facebook applications and the number of days passed since the user initially installed our add-on. Additionally, we found out that users who used our Firefox add-on become more aware of their security and privacy installing on average fewer new applications. Finally, we discovered that on average 86.4% of Facebook users install an additional application every 4.2 days.
Cornell University - arXiv, Jul 2, 2020
The COVID-19 pandemic outbreak, with its related social distancing and shelter-in-place measures,... more The COVID-19 pandemic outbreak, with its related social distancing and shelter-in-place measures, has dramatically affected ways in which people communicate with each other, forcing people to find new ways to collaborate, study, celebrate special occasions, and meet with family and friends. One of the most popular solutions that have emerged is the use of video conferencing applications to replace face-to-face meetings with virtual meetings. This resulted in unprecedented growth in the number of video conferencing users. In this study, we explored privacy issues that may be at risk by attending virtual meetings. We extracted private information from collage images of meeting participants that are publicly posted on the Web. We used image processing, text recognition tools, as well as social network analysis to explore our web crawling curated dataset of over 15,700 collage images, and over 142,000 face images of meeting participants. We demonstrate that video conference users are facing prevalent security and privacy threats. Our results indicate that it is relatively easy to collect thousands of publicly available images of video conference meetings and extract personal information about the participants, including their face images, age, gender, usernames, and sometimes even full names. This type of extracted data can vastly and easily jeopardize people's security and privacy both in the online and real-world, affecting not only adults but also more vulnerable segments of society, such as young children and older adults. Finally, we show that cross-referencing facial image data with social network data may put participants at additional privacy risks they may not be aware of and that it is possible to identify users that appear in several video conference meetings, thus providing a potential to maliciously aggregate different sources of information about a target individual.
2017 IEEE International Conference on Internet of Things (iThings) and IEEE Green Computing and Communications (GreenCom) and IEEE Cyber, Physical and Social Computing (CPSCom) and IEEE Smart Data (SmartData), 2017
Online advertising is a huge, rapidly growing advertising market in today's world. One common for... more Online advertising is a huge, rapidly growing advertising market in today's world. One common form of online advertising is using image ads. A decision is made (often in real time) every time a user sees an ad, and the advertiser is eager to determine the best ad to display. Consequently, many algorithms have been developed that calculate the optimal ad to show to the current user at the present time. Typically, these algorithms focus on variations of the ad, optimizing among different properties such as background color, image size, or set of images. However, there is a more fundamental layer. Our study looks at new qualities of ads that can be determined before an ad is shown (rather than online optimization) and defines which ads are most likely to be successful. We present a set of novel algorithms that utilize deep-learning image processing, machine learning, and graph theory to investigate online advertising and to construct prediction models which can foresee an image ad's success. We evaluated our algorithms on a dataset with over 260,000 ad images, as well as a smaller dataset specifically related to the automotive industry, and we succeeded in constructing regression models for ad image click rate prediction. The obtained results emphasize the great potential of using deep-learning algorithms to effectively and efficiently analyze image ads and to create better and more innovative online ads. Moreover, the algorithms presented in this paper can help predict ad success and can be applied to analyze other large-scale image corpora.
ArXiv, 2016
In the past decade, complex network structures have penetrated nearly every aspect of our lives. ... more In the past decade, complex network structures have penetrated nearly every aspect of our lives. The detection of anomalous vertices in these networks can uncover important insights, such as exposing intruders in a computer network. In this study, we present a novel unsupervised twolayered meta classifier that can be employed to detect irregular vertices in complex networks using solely features extracted from the network topology. Our method is based on the hypothesis that a vertex having many links with low probabilities of existing has a higher likelihood of being anomalous. We evaluated our method on ten networks, using three fully simulated, five semi-simulated, and two real world datasets. In all the scenarios, our method was able to identify anomalous and irregular vertices with low false positive rates and high AUCs. Moreover, we demonstrated that our method can be applied to security-related use cases and is able to detect malicious profiles in online social networks. Proje...
ISPRS Journal of Photogrammetry and Remote Sensing, 2021
The spread of the Red Palm Weevil has dramatically affected date growers, homeowners and governme... more The spread of the Red Palm Weevil has dramatically affected date growers, homeowners and governments, forcing them to deal with a constant threat to their palm trees. Early detection of palm tree infestation has been proven to be critical in order to allow treatment that may save trees from irreversible damage, and is most commonly performed by local physical access for individual tree monitoring. Here, we present a novel method for surveillance of Red Palm Weevil infested palm trees utilizing state-of-the-art deep learning algorithms, with aerial and street-level imagery data. To detect infested palm trees we analyzed over 100,000 aerial and street-images, mapping the location of palm trees in urban areas. Using this procedure, we discovered and verified infested palm trees at various locations.
Palgrave Communications, 2020
Data science can offer answers to a wide range of social science questions. Here we turn attentio... more Data science can offer answers to a wide range of social science questions. Here we turn attention to the portrayal of women in movies, an industry that has a significant influence on society, impacting such aspects of life as self-esteem and career choice. To this end, we fused data from the online movie database IMDb with a dataset of movie dialogue subtitles to create the largest available corpus of movie social networks (15,540 networks). Analyzing this data, we investigated gender bias in on-screen female characters over the past century. We find a trend of improvement in all aspects of women‘s roles in movies, including a constant rise in the centrality of female characters. There has also been an increase in the number of movies that pass the well-known Bechdel test, a popular—albeit flawed—measure of women in fiction. Here we propose a new and better alternative to this test for evaluating female roles in movies. Our study introduces fresh data, an open-code framework, and n...
COVID-19 is the most rapidly expanding coronavirus outbreak in the past two decades. To provide a... more COVID-19 is the most rapidly expanding coronavirus outbreak in the past two decades. To provide a swift response to a novel outbreak, prior knowledge from similar outbreaks is essential. Here, we study the volume of research conducted on previous coronavirus outbreaks, specifically SARS and MERS, relative to other infectious diseases by analyzing over 35 million papers from the last 20 years. Our results demonstrate that previous coronavirus outbreaks have been understudied compared to other viruses. We also show that the research volume of emerging infectious diseases is very high after an outbreak and drops drastically upon the containment of the disease. This can yield inadequate research and limited investment in gaining a full understanding of novel coronavirus management and prevention. Independent of the outcome of the current COVID-19 outbreak, we believe that measures should be taken to encourage sustained research in the field.
Social Network Analysis and Mining, 2018
In the past decade, network structures have penetrated nearly every aspect of our lives. The dete... more In the past decade, network structures have penetrated nearly every aspect of our lives. The detection of anomalous vertices in these networks has become increasingly important, such as in exposing computer network intruders or identifying fake online reviews. In this study, we present a novel unsupervised two-layered meta-classifier that can detect irregular vertices in complex networks solely by using features extracted from the network topology. Following the reasoning that a vertex with many improbable links has a higher likelihood of being anomalous, we employed our method on 10 networks of various scales, from a network of several dozen students to online social networks with millions of users. In every scenario, we were able to identify anomalous vertices with lower false positive rates and higher AUCs compared to other prevalent methods. Moreover, we demonstrated that the presented algorithm is efficient both in revealing fake users and in disclosing the most influential people in social networks.
Networks and Spatial Economics, 2015
Mature social networking services are one of the greatest assets of today's organizations. This v... more Mature social networking services are one of the greatest assets of today's organizations. This valuable asset, however, can also be a threat to an organization's confidentiality. Members of social networking websites expose not only their personal information, but also details about the organizations for which they work. In this paper we analyze several commercial organizations by mining data which their employees have exposed on Facebook, LinkedIn, and other publicly available sources. Using a web crawler designed for this purpose, we extract a network of informal social relationships among employees of a given target organization. Our results, obtained using centrality analysis and Machine Learning techniques applied to the structure of the informal relationships network, show that it is possible to identify leadership roles within the organization solely by this means. It is also possible to gain valuable non-trivial insights on an organization's structure by clustering its social network and gathering publicly available information on the employees within each cluster. Organizations wanting to conceal their internal structure, identity of leaders, location and specialization of branches offices, etc., must enforce strict policies to control the use of social media by their employees.
Neurocomputing, 2016
Online Social Networks (OSNs), such as Facebook and Twitter, have become an integral part of our ... more Online Social Networks (OSNs), such as Facebook and Twitter, have become an integral part of our daily lives. There are hundreds of OSNs, each with its own focus in that each offers particular services and functionalities. Recent studies show that many OSN users create several accounts on multiple OSNs using the same or different personal information. Collecting all the available data of an individual from several OSNs and fusing it into a single profile can be useful for many purposes. In this paper, we introduce novel machine learning based methods for solving Entity Resolution (ER), a problem for matching user profiles across multiple OSNs. The presented methods are able to match between two user profiles from two different OSNs based on supervised learning techniques, which use features extracted from each one of the user profiles. By using the extracted features and supervised learning techniques, we developed classifiers which can perform entity matching between two profiles for the following scenarios: (a) matching entities across two OSNs; (b) searching for a user by similar name; and (c) de-anonymizing a user's identity. The constructed classifiers were tested by using data collected from two popular OSNs, Facebook and Xing. We then evaluated the classifiers' performances using various evaluation measures, such as true and false positive rates, accuracy, and the Area Under the receiver operator Curve (AUC). The constructed classifiers were evaluated and their classification performance measured by AUC was quite remarkable, with an AUC of up to 0.982 and an accuracy of up to 95.9% in identifying user profiles across two OSNs.
Lecture Notes in Computer Science, 2012
As truly ubiquitous wearable computers, mobile phones are quickly becoming the primary source for... more As truly ubiquitous wearable computers, mobile phones are quickly becoming the primary source for social, behavioral and environmental sensing and data collection. Today's smartphones are equipped with increasingly more sensors and accessible data types that enable the collection of literally dozens of signals related to the phone, its user, and its environment. A great deal of research effort in academia and industry is put into mining this raw data for higher level sense-making, such as understanding user context, inferring social networks, learning individual features, and so on. In many cases, this analysis work is the result of exploratory forays and trial-and-error. In this work we investigate the properties of learning and inferences of real world data collected via mobile phones for different sizes of analyzed networks. In particular, we examine how the ability to predict individual features and social links is incrementally enhanced with the accumulation of additional data. To accomplish this, we use the Friends and Family dataset, which contains rich data signals gathered from the smartphones of 130 adult members of a young-family residential community over the course of a year and consequently has become one of the most comprehensive mobile phone datasets gathered in academia to date. Our results show that features such as ethnicity, age and marital status can be detected by analyzing social and behavioral signals. We then investigate how the prediction accuracy is increased when the users sample set grows. Finally, we propose a method for advanced prediction of the maximal learning accuracy possible for the learning task at hand, based on an initial set of measurements. These predictions have practical implications, such as influencing the design of mobile data collection campaigns or evaluating analysis strategies.
ACM Transactions on Intelligent Systems and Technology, 2015
Online genealogy datasets contain extensive information about millions of people and their past a... more Online genealogy datasets contain extensive information about millions of people and their past and present family connections. This vast amount of data can help identify various patterns in the human population. In this study, we present methods and algorithms that can assist in identifying variations in lifespan distributions of the human population in the past centuries, in detecting social and genetic features that correlate with the human lifespan, and in constructing predictive models of human lifespan based on various features that can easily be extracted from genealogy datasets. We have evaluated the presented methods and algorithms on a large online genealogy dataset with over a million profiles and over 9 million connections, all of which were collected from the WikiTree website. Our findings indicate that significant but small positive correlations exist between the parents’ lifespan and their children’s lifespan. Additionally, we found slightly higher and significant cor...
Handbook of Computational Approaches to Counterterrorism, 2012
In recent years, online social networks have grown in scale and variability and offer individuals... more In recent years, online social networks have grown in scale and variability and offer individuals with similar interests the possibility of exchanging ideas and networking. On the one hand, social networks create new opportunities to develop friendships, share ideas, and conduct business. On the other hand, they are also an effective media tool for plotting crime and organizing extremists groups around the world. Online social networks, such as Facebook, Google+, and Twitter are hard to track due to their massive scale and increased awareness of privacy. Criminals and terrorists strive to hide their relationships, especially those that can associate them with a executed terror act.
Lecture Notes in Computer Science, 2013
Mobile phones are quickly becoming the primary source for social, behavioral, and environmental s... more Mobile phones are quickly becoming the primary source for social, behavioral, and environmental sensing and data collection. Today's smartphones are equipped with increasingly more sensors and accessible data types that enable the collection of literally dozens of signals related to the phone, its user, and its environment. A great deal of research effort in academia and industry is put into mining this raw data for higher level sense-making, such as understanding user context, inferring social networks, learning individual features, and behavior prediction. In this work we investigate the properties of learning and inferences of real world data collected via mobile phones. In particular, we look at the dynamic learning process over time with various sizes of sampling groups and examine the interplay between these two parameters. We validate our model using extensive simulations carried out using the "Friends and Family" dataset which contains rich data signals gathered from the smartphones of 140 adult members of a young-family residential community for over a year and is one of the most comprehensive mobile phone datasets gathered in academia to date.
Social Network Analysis and Mining, 2014
The amount of personal information unwillingly exposed by users on online social networks is stag... more The amount of personal information unwillingly exposed by users on online social networks is staggering, as shown in recent research. Moreover, recent reports indicate that these networks are infested with tens of millions of fake users profiles, which may jeopardize the users' security and privacy. To identify fake users in such networks and to improve users' security and privacy, we developed the Social Privacy Protector software for Facebook. This software contains three protection layers, which improve user privacy by implementing different methods. The software first identifies a user's friends who might pose a threat and then restricts this "friend's" exposure to the user's personal information. The second layer is an expansion of Facebook's basic privacy settings based on different types of social network usage profiles. The third layer alerts users about the number of installed applications on their Facebook profile, which have access to their private information. An initial version of the Social Privacy Protection software received high media coverage, and more than 3,000 users from more than twenty countries have installed the software, out of which 527 used the software to restrict more than nine thousand friends. In addition, we estimate that more than a hundred users accepted the software's recommendations and removed at least 1,792 Facebook applications from their profiles. By analyzing the unique dataset obtained by the software in combination with machine learning techniques, we developed classifiers, which are able to predict which Facebook profiles have high probabilities of being fake and therefore, threaten the user's well-being. Moreover, in this study, we present statistics on users' privacy settings and statistics of the number of applications installed on Facebook profiles. Both statistics are obtained by the Social Privacy Protector software. These statistics alarmingly demonstrate how exposed Facebook users information is to both fake profile attacks and third party Facebook applications.
2012 IEEE 27th Convention of Electrical and Electronics Engineers in Israel, 2012
ABSTRACT Traffic measurements, road safety studies, and surveys are required for efficient road p... more ABSTRACT Traffic measurements, road safety studies, and surveys are required for efficient road planning and ensuring the safety of transportation. Unfortunately, these methods can be cumbersome and very expensive. In this paper we point out a source of transportation information that is based on collaborative community-based navigation applications, such as Waze. Partial and anonimized information publicly exposed by Waze through their application provides valuable information that can significantly ease the future of transportation studies. Moreover, we show that Waze user reports may expose locations plagued with accidents but in lacking police coverage. This knowledge may help police departments to improve road safety by relocating the police units to these locations. Lastly, the data discussed in this paper connects transportation and road safety research to location based services and social network platforms.
2012 International Conference on Social Informatics, 2012
In the recent years we have seen a significant growth in the usage of online social networks. Com... more In the recent years we have seen a significant growth in the usage of online social networks. Common networks like Facebook, Twitter, Pinterest, and LinkedIn have become popular all over the world. In these networks users write, share, and publish personal information about themselves, their friends, and their workplace. In this study we present a method for the mining of information of an organization through the use of social networks and socialbots. Our socialbots sent friend requests to Facebook users who work in a targeted organization. Upon accepting a socialbot's friend request, users unknowingly expose information about themselves and about their workplace. We tested the proposed method on two real organizations and successfully infiltrated both. Compared to our previous study, our method was able to discover up to 13.55% more employees and up to 18.29% more informal organizational links. Our results demonstrate once again that organizations which are interested in protecting themselves should instruct their employees not to disclose information in social networks and to be cautious of accepting friendship requests from unknown persons.
Driven by the popularity of television shows such as Who Do You Think You Are? many millions of u... more Driven by the popularity of television shows such as Who Do You Think You Are? many millions of users have uploaded their family tree to web projects such as WikiTree [1]. Analysis of this corpus enables us to investigate genealogy computationally. The study of heritage in the social sciences has led to an increased understanding of ancestry and descent [2] but such efforts are hampered by difficult to access data [3]. Genealogical research is typically a tedious process involving trawling through sources such as birth and death certificates, wills, letters and land deeds [4]. Decades of research have developed and examined hypotheses on population sex ratios, marriage trends, fertility, lifespan, and the frequency of twins and triplets. These can now be tested on vast datasets containing many billions of entries using machine learning tools. Here we survey the use of genealogy data mining using family trees dating back centuries and featuring profiles on nearly 7 million individuals based in over 160 countries. These data are not typically created by trained genealogists and so we verify them with reference to third party censuses. We present results on a range of aspects of population dynamics. Our approach extends the boundaries of genealogy inquiry to precise measurement of underlying human phenomena.