Daniel Thilo Schroeder | SINTEF (original) (raw)
Papers by Daniel Thilo Schroeder
Journal article , 2025
The study combines domain expertise and computational community detection to uncover what role ci... more The study combines domain expertise and computational community detection to uncover what role citizen journalists and social media platforms play in mediating the dynamics of conflict in Mali. Under conditions of the growing conflict in Mali, citizen journalists are opening Twitter (rebranded as X) accounts to stay updated and tweet about the ongoing socio-political tensions, chronicling life in a conflict-ravaged context.
This article conceptualizes the rapid reliance on Twitter among citizen journalists
consisting of bloggers, activists, government officials and NGO’s as a form of networked conflict and networked journalism. Networked journalism emerges as professional journalists adopt tools and techniques used by nonprofessionals (and vice versa) to gather and disseminate information while networked conflict involves the consequential and intricate relationship between social media and conflict in the Sahel region of Africa. Our findings show that Twitter is a source of action that promotes and mediates conflict, which exposes users to conflict-related content. The findings also show that what accounts for citizen journalism in a conflict setting is vague as those with access to Twitter and as such, the presumed ability to influence the narrative, unequivocally consider
themselves citizen journalists.
arXiv (Cornell University), Oct 8, 2023
Shortly after the first COVID-19 cases became apparent in December 2020, rumors spread on social ... more Shortly after the first COVID-19 cases became apparent in December 2020, rumors spread on social media suggesting a connection between the virus and the 5G radiation emanating from the recently deployed telecommunications network. In the course of the following weeks, this idea gained increasing popularity, and various alleged explanations for how such a connection manifests emerged. Ultimately, after being amplified by prominent conspiracy theorists, a series of arson attacks on telecommunication equipment follows, concluding with the kidnapping of telecommunication technicians in Peru. In this paper, we study the spread of content related to a conspiracy theory with harmful consequences, a so-called digital wildfire. In particular, we investigate the 5G and COVID-19 misinformation event on Twitter before, during, and after its peak in April and May 2020. For this purpose, we examine the community dynamics in complex temporal interaction networks underlying Twitter user activity. We assess the evolution of such digital wildfires by appropriately defining the temporal dynamics of communication in communities within social networks. We show that, for this specific misinformation event, the number of interactions of the users participating in a digital wildfire, as well as the size of the engaged communities, both follow a power-law distribution. Moreover, our research elucidates the possibility of quantifying the phases of a digital wildfire, as per established literature. We identify one such phase as a critical transition, marked by a shift from sporadic tweets to a global spread event, highlighting the dramatic scaling of misinformation propagation. Additionally, we argue that the driving forces behind this observed transition are attributed to influential users, who act as catalysts, accelerating the spread of misinformation. Lastly, our data suggest that the characteristics of such events may be predictable, at least in some instances. From this data, we hypothesize that monitoring minor peaks in user interactions, which precede the critical phase culminating in real-world consequences, could serve as an early warning system, aiding in the prediction and potentially the mitigation of digital wildfires.
arXiv (Cornell University), Apr 25, 2023
With the expansion of mobile communications infrastructure, social media usage in the Global Sout... more With the expansion of mobile communications infrastructure, social media usage in the Global South is surging. Compared to the Global North, populations of the Global South have had less prior experience with social media from stationary computers and wired Internet. Many countries are experiencing violent conflicts that have a profound effect on their societies. As a result, social networks develop under different conditions than elsewhere, and our goal is to provide data for studying this phenomenon. In this dataset paper, we present a data collection of a national Twittersphere in a West African country of conflict. While not the largest social network in terms of users, Twitter is an important platform where people engage in public discussion. The focus is on Mali, a country beset by conflict since 2012 that has recently had a relatively precarious media ecology. The dataset consists of tweets and Twitter users in Mali and was collected in June 2022, when the Malian conflict became more violent internally both towards external and international actors. In a preliminary analysis, we assume that the conflictual context influences how people access social media and, therefore, the shape of the Twittersphere and its characteristics. The aim of this paper is to primarily invite researchers from various disciplines including complex networks and social sciences scholars to explore the data at hand further. We collected the dataset using a scraping strategy of the follower network and the identification of characteristics of a Malian Twitter user. The given snapshot of the Malian Twitter follower network contains around seven million accounts, of which 56,000 are clearly identifiable as Malian. In addition, we present the tweets. The dataset is available at: https://osf.io/mj2qt/?view_only=460f5daef1024f05a0d45e082d26059f (peer review version).
Journal of Computational Social Science
The COVID-19 pandemic has been accompanied by a surge of misinformation on social media which cov... more The COVID-19 pandemic has been accompanied by a surge of misinformation on social media which covered a wide range of different topics and contained many competing narratives, including conspiracy theories. To study such conspiracy theories, we created a dataset of 3495 tweets with manual labeling of the stance of each tweet w.r.t. 12 different conspiracy topics. The dataset thus contains almost 42,000 labels, each of which determined by majority among three expert annotators. The dataset was selected from COVID-19 related Twitter data spanning from January 2020 to June 2021 using a list of 54 keywords. The dataset can be used to train machine learning based classifiers for both stance and topic detection, either individually or simultaneously. BERT was used successfully for the combined task. The dataset can also be used to further study the prevalence of different conspiracy narratives. To this end we qualitatively analyze the tweets, discussing the structure of conspiracy narrati...
Journal of Computational Social Science, 2023
The COVID-19 pandemic has been accompanied by a surge of misinformation on social media which cov... more The COVID-19 pandemic has been accompanied by a surge of misinformation on social media which covered a wide range of different topics and contained many competing narratives, including conspiracy theories. To study such conspiracy theories, we created a dataset of 3495 tweets with manual labeling of the stance of each tweet w.r.t. 12 different conspiracy topics. The dataset thus contains almost 42,000 labels, each of which determined by majority among three expert annotators. The dataset was selected from COVID-19 related Twitter data spanning from January 2020 to June 2021 using a list of 54 keywords. The dataset can be used to train machine learning based classifiers for both stance and topic detection, either individually or simultaneously. BERT was used successfully for the combined task. The dataset can also be used to further study the prevalence of different conspiracy narratives. To this end we qualitatively analyze the tweets, discussing the structure of conspiracy narratives that are frequently found in the dataset. Furthermore, we illustrate the interconnection between the conspiracy categories as well as the keywords.
2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)
2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)
2019 Sixth International Conference on Social Networks Analysis, Management and Security (SNAMS), 2019
In the recent years, online social networks have become an important source of news and the prima... more In the recent years, online social networks have become an important source of news and the primary place for political debates for a growing part of the population. At the same time, the spread of fake news and digital wildfires (fast-spreading and harmful misinformation) has become a growing concern worldwide, and in online social networks the problem is most prevalent. Thus, the study of social networks is an essential component in the understanding of the fake news phenomenon. Of particular interest is the network connectivity between participants, since it makes communication patterns visible. These patterns are hidden in the offline world, but they have a profound impact on the spread of ideas, opinions and news. Among the major social networks, Twitter is of special interest. Because of its public nature, Twitter offers the possibility to perform research without the risk of breaching the expectation of privacy. However, obtaining sufficient amounts of data from Twitter is a fundamental challenge for many researchers. Thus, in this paper, we present a scalable framework for gathering the graph structure of follower networks, posts and profiles. We also show how to use the collected data for high-performance social network analysis.
Lecture Notes in Computer Science, 2021
The Graphcore Intelligence Processing Unit (IPU) is a newly developed processor type whose archit... more The Graphcore Intelligence Processing Unit (IPU) is a newly developed processor type whose architecture does not rely on the traditional caching hierarchies. Developed to meet the need for more and more data-centric applications, such as machine learning, IPUs combine a dedicated portion of SRAM with each of its numerous cores, resulting in high memory bandwidth at the price of capacity. The proximity of processor cores and memory makes the IPU a promising field of experimentation for graph algorithms since it is the unpredictable, irregular memory accesses that lead to performance losses in traditional processors with pre-caching. This paper aims to test the IPU's suitability for algorithms with hard-to-predict memory accesses by implementing a breadth-first search (BFS) that complies with the Graph500 specifications. Precisely because of its apparent simplicity, BFS is an established benchmark that is not only subroutine for a variety of more complex graph algorithms, but also allows comparability across a wide range of architectures. We benchmark our IPU code on a wide range of instances and compare its performance to state-of-the-art CPU and GPU codes. The results indicate that the IPU delivers speedups of up to 4× over the fastest competing result on an NVIDIA V100 GPU, with typical speedups of about 1.5× on most test instances.
2020 Seventh International Conference on Social Networks Analysis, Management and Security (SNAMS), 2020
Online social networks such as Facebook and Twitter are part of the everyday life of millions of ... more Online social networks such as Facebook and Twitter are part of the everyday life of millions of people. They are not only used for interaction but play an essential role when it comes to information acquisition and knowledge gain. The abundance and detail of the accumulated data in these online social networks open up new possibilities for social researchers and psychologists, allowing them to study behavior in a large test population. However, complex application programming interfaces (API) and data scraping restrictions are, in many cases, a limiting factor when accessing this data. Furthermore, research projects are typically granted restricted access based on quotas. Thus, research tools such as scrapers that access social network data through an API must manage these quotas. While this is generally feasible, it becomes a problem when more than one tool, or multiple instances of the same tool, is being used in the same research group. Since different tools typically cannot balance access to a shared quota on their own, additional software is needed to prevent the individual tools from overusing the shared quota. In this paper, we present a proxy server that manages several researchers' data contingents in a cooperative research environment and thus enables a transparent view of a subset of Twitter's API. Our proxy scales linearly with the number of clients in use and incurs almost no performance penalties or implementation overhead to further layer or applications that need to work with the Twitter API. Thus, it allows seamless integration of multiple API accessing programs within the same research group.
Scientific Reports
Online social networks are ubiquitous, have billions of users, and produce large amounts of data.... more Online social networks are ubiquitous, have billions of users, and produce large amounts of data. While platforms like Reddit are based on a forum-like organization where users gather around topics, Facebook and Twitter implement a concept in which individuals represent the primary entity of interest. This makes them natural testbeds for exploring individual behavior in large social networks. Underlying these individual-based platforms is a network whose “friend” or “follower” edges are of binary nature only and therefore do not necessarily reflect the level of acquaintance between pairs of users. In this paper,we present the network of acquaintance “strengths” underlying the German Twittersphere. To that end, we make use of the full non-verbal information contained in tweet–retweet actions to uncover the graph of social acquaintances among users, beyond pure binary edges. The social connectivity between pairs of users is weighted by keeping track of the frequency of shared content ...
International journal of data science and analytics, May 27, 2022
The COVID-19 pandemic has severely affected the lives of people worldwide, and consequently, it h... more The COVID-19 pandemic has severely affected the lives of people worldwide, and consequently, it has dominated world news since March 2020. Thus, it is no surprise that it has also been the topic of a massive amount of misinformation, which was most likely amplified by the fact that many details about the virus were not known at the start of the pandemic. While a large amount of this misinformation was harmless, some narratives spread quickly and had a dramatic real-world effect. Such events are called digital wildfires. In this paper we study a specific digital wildfire: the idea that the COVID-19 outbreak is somehow connected to the introduction of 5G wireless technology, which caused real-world harm in April 2020 and beyond. By analyzing early social media contents we investigate the origin of this digital wildfire and the developments that lead to its wide spread. We show how the initial idea was derived from existing opposition to wireless networks, how videos rather than tweets played a crucial role in its propagation, and how commercial interests can partially explain the wide distribution of this particular piece of misinformation. We then illustrate how the initial events in the UK were echoed several months later in different countries around the world.
2016 IEEE International Conference on Big Data (Big Data), 2016
Distributed dataflow systems like Spark and Flink allow to analyze large datasets using clusters ... more Distributed dataflow systems like Spark and Flink allow to analyze large datasets using clusters of computers. These frameworks provide automatic program parallelization and manage distributed workers, including worker failures. Moreover, they provide high-level programming abstractions and execute programs efficiently. Yet, the programming abstractions remain textual while the dataflow model is essentially a graph of transformations. Thus, there is a mismatch between the presented abstraction and the underlying model here. One can also argue that developing dataflow programs with these textual abstractions requires needless amounts of coding and coding skills. A dedicated programming environment could instead allow constructing dataflow programs more interactively and visually. In this paper, we therefore investigate how visual programming can make the development of parallel dataflow programs more accessible. In particular, we built a prototypical visual programming environment for Flink, which we call Flision. Flision provides a graphical user interface for creating dataflow programs, a code generation engine that generates code for Flink, and seamless deployment to a connected cluster. Users of this environment can effectively create jobs by dragging, dropping, and visually connecting operator components. To evaluate the applicability of this approach, we interviewed ten potential users. Our impressions from this qualitative user testing strengthened our believe that visual programming can be a valuable tool for users of scalable data analysis tools.
Frontiers in Communication, 2021
We review the phenomenon of deepfakes, a novel technology enabling inexpensive manipulation of vi... more We review the phenomenon of deepfakes, a novel technology enabling inexpensive manipulation of video material through the use of artificial intelligence, in the context of today’s wider discussion on fake news. We discuss the foundation as well as recent developments of the technology, as well as the differences from earlier manipulation techniques and investigate technical countermeasures. While the threat of deepfake videos with substantial political impact has been widely discussed in recent years, so far, the political impact of the technology has been limited. We investigate reasons for this and extrapolate the types of deepfake videos we are likely to see in the future.
Multimedia Evaluation Benchmark Workshop 2020, MediaEval 2020, 2020
This paper summarises the results created through participation in the task FakeNews: Corona Viru... more This paper summarises the results created through participation in the task FakeNews: Corona Virus and 5G Conspiracy of the MediaEval Multimedia Evaluation Challenge 2020. The task consists of two parts intending to detect tweets and retweet cascades that emerged during the COVID-19 pandemic and causally connect the radiation of 5G networks with the virus. We applied several well-established neural networks and machine learning techniques for the first subtasks, namely, textual information classification. For the second task, the retweet cascades analysis, we rely on classifiers that work on established graph features, such as the clustering coefficient or graph diameter. Our results show a MCC-score of 0.148 or 0.162 for the NLP task and 0.02 for the structure task. © 2020 Copyright 2020 for this paper by its authors. All Rights Reserved.
We design a system for efficient in-memory analysis of data from the GDELT database of news event... more We design a system for efficient in-memory analysis of data from the GDELT database of news events. The specialization of the system allows us to avoid the inefficiencies of existing alternatives, and make full use of modern parallel high-performance computing hardware. We then present a series of experiments showcasing the system’s ability to analyze correlations in the entire GDELT 2.0 database containing more than a billion news items. The results reveal large scale trends in the world of today’s online news.
Proceedings of the 2021 Workshop on Open Challenges in Online Social Networks, 2021
The COVID-19 pandemic has been accompanied by a flood of misinformation on social media, which ha... more The COVID-19 pandemic has been accompanied by a flood of misinformation on social media, which has been labeled an "infodemic". While a large part of such fake news is ultimately inconsequential, some of it has the potential to real-world harm, but due to the massive amount of social media contents, it is impossible to find this misinformation manually. Thus, conventional fact-checking can typically only counteract misinformation narratives after they have gained significant traction. Only automated systems can provide warnings in advance. However, the automatic detection of misinformation narratives is very challenging since the texts that spread misinformation may be short messages on Twitter. They may also transmit misinformation by implication rather than by stating counterfactual information outright, and satirical messages complicate the issue further. Thus, there is a need for highly sophisticated detection systems. In order to support their development, we created substantial ground truth data by human annotation. In this paper, we present a dataset that deals with a specific piece of misinformation: the idea that the COVID-19 pandemic is causally connected to the 5G wireless network. We selected more than 10,000 tweets that deal with COVID-19 and 5G and labeled them manually, distinguishing between tweets that propagate the specific 5G misinformation, those that spread other conspiracy theories, and tweets that do neither. We provide the human-annotated dataset along with an additional large-scale automatically (by using the humanannotated dataset as the training set) labelled dataset consist of more than 100,000 tweets.
In the wake of the COVID-19 pandemic, a surge of misinformation has flooded social media and othe... more In the wake of the COVID-19 pandemic, a surge of misinformation has flooded social media and other internet channels, and some of it has the potential to cause real-world harm. To counteract this misinformation, reliably identifying it is a principal problem to be solved. However, the identification of misinformation poses a formidable challenge for language processing systems since the texts containing misinformation are short, work with insinuation rather than explicitly stating a false claim, or resemble other postings that deal with the same topic ironically. Accordingly, for the development of better detection systems, it is not only essential to use hand-labeled ground truth data and extend the analysis with methods beyond Natural Language Processing to consider the characteristics of the participant's relationships and the diffusion of misinformation. This paper presents a novel dataset that deals with a specific piece of misinformation: the idea that the 5G wireless netw...
Journal article , 2025
The study combines domain expertise and computational community detection to uncover what role ci... more The study combines domain expertise and computational community detection to uncover what role citizen journalists and social media platforms play in mediating the dynamics of conflict in Mali. Under conditions of the growing conflict in Mali, citizen journalists are opening Twitter (rebranded as X) accounts to stay updated and tweet about the ongoing socio-political tensions, chronicling life in a conflict-ravaged context.
This article conceptualizes the rapid reliance on Twitter among citizen journalists
consisting of bloggers, activists, government officials and NGO’s as a form of networked conflict and networked journalism. Networked journalism emerges as professional journalists adopt tools and techniques used by nonprofessionals (and vice versa) to gather and disseminate information while networked conflict involves the consequential and intricate relationship between social media and conflict in the Sahel region of Africa. Our findings show that Twitter is a source of action that promotes and mediates conflict, which exposes users to conflict-related content. The findings also show that what accounts for citizen journalism in a conflict setting is vague as those with access to Twitter and as such, the presumed ability to influence the narrative, unequivocally consider
themselves citizen journalists.
arXiv (Cornell University), Oct 8, 2023
Shortly after the first COVID-19 cases became apparent in December 2020, rumors spread on social ... more Shortly after the first COVID-19 cases became apparent in December 2020, rumors spread on social media suggesting a connection between the virus and the 5G radiation emanating from the recently deployed telecommunications network. In the course of the following weeks, this idea gained increasing popularity, and various alleged explanations for how such a connection manifests emerged. Ultimately, after being amplified by prominent conspiracy theorists, a series of arson attacks on telecommunication equipment follows, concluding with the kidnapping of telecommunication technicians in Peru. In this paper, we study the spread of content related to a conspiracy theory with harmful consequences, a so-called digital wildfire. In particular, we investigate the 5G and COVID-19 misinformation event on Twitter before, during, and after its peak in April and May 2020. For this purpose, we examine the community dynamics in complex temporal interaction networks underlying Twitter user activity. We assess the evolution of such digital wildfires by appropriately defining the temporal dynamics of communication in communities within social networks. We show that, for this specific misinformation event, the number of interactions of the users participating in a digital wildfire, as well as the size of the engaged communities, both follow a power-law distribution. Moreover, our research elucidates the possibility of quantifying the phases of a digital wildfire, as per established literature. We identify one such phase as a critical transition, marked by a shift from sporadic tweets to a global spread event, highlighting the dramatic scaling of misinformation propagation. Additionally, we argue that the driving forces behind this observed transition are attributed to influential users, who act as catalysts, accelerating the spread of misinformation. Lastly, our data suggest that the characteristics of such events may be predictable, at least in some instances. From this data, we hypothesize that monitoring minor peaks in user interactions, which precede the critical phase culminating in real-world consequences, could serve as an early warning system, aiding in the prediction and potentially the mitigation of digital wildfires.
arXiv (Cornell University), Apr 25, 2023
With the expansion of mobile communications infrastructure, social media usage in the Global Sout... more With the expansion of mobile communications infrastructure, social media usage in the Global South is surging. Compared to the Global North, populations of the Global South have had less prior experience with social media from stationary computers and wired Internet. Many countries are experiencing violent conflicts that have a profound effect on their societies. As a result, social networks develop under different conditions than elsewhere, and our goal is to provide data for studying this phenomenon. In this dataset paper, we present a data collection of a national Twittersphere in a West African country of conflict. While not the largest social network in terms of users, Twitter is an important platform where people engage in public discussion. The focus is on Mali, a country beset by conflict since 2012 that has recently had a relatively precarious media ecology. The dataset consists of tweets and Twitter users in Mali and was collected in June 2022, when the Malian conflict became more violent internally both towards external and international actors. In a preliminary analysis, we assume that the conflictual context influences how people access social media and, therefore, the shape of the Twittersphere and its characteristics. The aim of this paper is to primarily invite researchers from various disciplines including complex networks and social sciences scholars to explore the data at hand further. We collected the dataset using a scraping strategy of the follower network and the identification of characteristics of a Malian Twitter user. The given snapshot of the Malian Twitter follower network contains around seven million accounts, of which 56,000 are clearly identifiable as Malian. In addition, we present the tweets. The dataset is available at: https://osf.io/mj2qt/?view_only=460f5daef1024f05a0d45e082d26059f (peer review version).
Journal of Computational Social Science
The COVID-19 pandemic has been accompanied by a surge of misinformation on social media which cov... more The COVID-19 pandemic has been accompanied by a surge of misinformation on social media which covered a wide range of different topics and contained many competing narratives, including conspiracy theories. To study such conspiracy theories, we created a dataset of 3495 tweets with manual labeling of the stance of each tweet w.r.t. 12 different conspiracy topics. The dataset thus contains almost 42,000 labels, each of which determined by majority among three expert annotators. The dataset was selected from COVID-19 related Twitter data spanning from January 2020 to June 2021 using a list of 54 keywords. The dataset can be used to train machine learning based classifiers for both stance and topic detection, either individually or simultaneously. BERT was used successfully for the combined task. The dataset can also be used to further study the prevalence of different conspiracy narratives. To this end we qualitatively analyze the tweets, discussing the structure of conspiracy narrati...
Journal of Computational Social Science, 2023
The COVID-19 pandemic has been accompanied by a surge of misinformation on social media which cov... more The COVID-19 pandemic has been accompanied by a surge of misinformation on social media which covered a wide range of different topics and contained many competing narratives, including conspiracy theories. To study such conspiracy theories, we created a dataset of 3495 tweets with manual labeling of the stance of each tweet w.r.t. 12 different conspiracy topics. The dataset thus contains almost 42,000 labels, each of which determined by majority among three expert annotators. The dataset was selected from COVID-19 related Twitter data spanning from January 2020 to June 2021 using a list of 54 keywords. The dataset can be used to train machine learning based classifiers for both stance and topic detection, either individually or simultaneously. BERT was used successfully for the combined task. The dataset can also be used to further study the prevalence of different conspiracy narratives. To this end we qualitatively analyze the tweets, discussing the structure of conspiracy narratives that are frequently found in the dataset. Furthermore, we illustrate the interconnection between the conspiracy categories as well as the keywords.
2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)
2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)
2019 Sixth International Conference on Social Networks Analysis, Management and Security (SNAMS), 2019
In the recent years, online social networks have become an important source of news and the prima... more In the recent years, online social networks have become an important source of news and the primary place for political debates for a growing part of the population. At the same time, the spread of fake news and digital wildfires (fast-spreading and harmful misinformation) has become a growing concern worldwide, and in online social networks the problem is most prevalent. Thus, the study of social networks is an essential component in the understanding of the fake news phenomenon. Of particular interest is the network connectivity between participants, since it makes communication patterns visible. These patterns are hidden in the offline world, but they have a profound impact on the spread of ideas, opinions and news. Among the major social networks, Twitter is of special interest. Because of its public nature, Twitter offers the possibility to perform research without the risk of breaching the expectation of privacy. However, obtaining sufficient amounts of data from Twitter is a fundamental challenge for many researchers. Thus, in this paper, we present a scalable framework for gathering the graph structure of follower networks, posts and profiles. We also show how to use the collected data for high-performance social network analysis.
Lecture Notes in Computer Science, 2021
The Graphcore Intelligence Processing Unit (IPU) is a newly developed processor type whose archit... more The Graphcore Intelligence Processing Unit (IPU) is a newly developed processor type whose architecture does not rely on the traditional caching hierarchies. Developed to meet the need for more and more data-centric applications, such as machine learning, IPUs combine a dedicated portion of SRAM with each of its numerous cores, resulting in high memory bandwidth at the price of capacity. The proximity of processor cores and memory makes the IPU a promising field of experimentation for graph algorithms since it is the unpredictable, irregular memory accesses that lead to performance losses in traditional processors with pre-caching. This paper aims to test the IPU's suitability for algorithms with hard-to-predict memory accesses by implementing a breadth-first search (BFS) that complies with the Graph500 specifications. Precisely because of its apparent simplicity, BFS is an established benchmark that is not only subroutine for a variety of more complex graph algorithms, but also allows comparability across a wide range of architectures. We benchmark our IPU code on a wide range of instances and compare its performance to state-of-the-art CPU and GPU codes. The results indicate that the IPU delivers speedups of up to 4× over the fastest competing result on an NVIDIA V100 GPU, with typical speedups of about 1.5× on most test instances.
2020 Seventh International Conference on Social Networks Analysis, Management and Security (SNAMS), 2020
Online social networks such as Facebook and Twitter are part of the everyday life of millions of ... more Online social networks such as Facebook and Twitter are part of the everyday life of millions of people. They are not only used for interaction but play an essential role when it comes to information acquisition and knowledge gain. The abundance and detail of the accumulated data in these online social networks open up new possibilities for social researchers and psychologists, allowing them to study behavior in a large test population. However, complex application programming interfaces (API) and data scraping restrictions are, in many cases, a limiting factor when accessing this data. Furthermore, research projects are typically granted restricted access based on quotas. Thus, research tools such as scrapers that access social network data through an API must manage these quotas. While this is generally feasible, it becomes a problem when more than one tool, or multiple instances of the same tool, is being used in the same research group. Since different tools typically cannot balance access to a shared quota on their own, additional software is needed to prevent the individual tools from overusing the shared quota. In this paper, we present a proxy server that manages several researchers' data contingents in a cooperative research environment and thus enables a transparent view of a subset of Twitter's API. Our proxy scales linearly with the number of clients in use and incurs almost no performance penalties or implementation overhead to further layer or applications that need to work with the Twitter API. Thus, it allows seamless integration of multiple API accessing programs within the same research group.
Scientific Reports
Online social networks are ubiquitous, have billions of users, and produce large amounts of data.... more Online social networks are ubiquitous, have billions of users, and produce large amounts of data. While platforms like Reddit are based on a forum-like organization where users gather around topics, Facebook and Twitter implement a concept in which individuals represent the primary entity of interest. This makes them natural testbeds for exploring individual behavior in large social networks. Underlying these individual-based platforms is a network whose “friend” or “follower” edges are of binary nature only and therefore do not necessarily reflect the level of acquaintance between pairs of users. In this paper,we present the network of acquaintance “strengths” underlying the German Twittersphere. To that end, we make use of the full non-verbal information contained in tweet–retweet actions to uncover the graph of social acquaintances among users, beyond pure binary edges. The social connectivity between pairs of users is weighted by keeping track of the frequency of shared content ...
International journal of data science and analytics, May 27, 2022
The COVID-19 pandemic has severely affected the lives of people worldwide, and consequently, it h... more The COVID-19 pandemic has severely affected the lives of people worldwide, and consequently, it has dominated world news since March 2020. Thus, it is no surprise that it has also been the topic of a massive amount of misinformation, which was most likely amplified by the fact that many details about the virus were not known at the start of the pandemic. While a large amount of this misinformation was harmless, some narratives spread quickly and had a dramatic real-world effect. Such events are called digital wildfires. In this paper we study a specific digital wildfire: the idea that the COVID-19 outbreak is somehow connected to the introduction of 5G wireless technology, which caused real-world harm in April 2020 and beyond. By analyzing early social media contents we investigate the origin of this digital wildfire and the developments that lead to its wide spread. We show how the initial idea was derived from existing opposition to wireless networks, how videos rather than tweets played a crucial role in its propagation, and how commercial interests can partially explain the wide distribution of this particular piece of misinformation. We then illustrate how the initial events in the UK were echoed several months later in different countries around the world.
2016 IEEE International Conference on Big Data (Big Data), 2016
Distributed dataflow systems like Spark and Flink allow to analyze large datasets using clusters ... more Distributed dataflow systems like Spark and Flink allow to analyze large datasets using clusters of computers. These frameworks provide automatic program parallelization and manage distributed workers, including worker failures. Moreover, they provide high-level programming abstractions and execute programs efficiently. Yet, the programming abstractions remain textual while the dataflow model is essentially a graph of transformations. Thus, there is a mismatch between the presented abstraction and the underlying model here. One can also argue that developing dataflow programs with these textual abstractions requires needless amounts of coding and coding skills. A dedicated programming environment could instead allow constructing dataflow programs more interactively and visually. In this paper, we therefore investigate how visual programming can make the development of parallel dataflow programs more accessible. In particular, we built a prototypical visual programming environment for Flink, which we call Flision. Flision provides a graphical user interface for creating dataflow programs, a code generation engine that generates code for Flink, and seamless deployment to a connected cluster. Users of this environment can effectively create jobs by dragging, dropping, and visually connecting operator components. To evaluate the applicability of this approach, we interviewed ten potential users. Our impressions from this qualitative user testing strengthened our believe that visual programming can be a valuable tool for users of scalable data analysis tools.
Frontiers in Communication, 2021
We review the phenomenon of deepfakes, a novel technology enabling inexpensive manipulation of vi... more We review the phenomenon of deepfakes, a novel technology enabling inexpensive manipulation of video material through the use of artificial intelligence, in the context of today’s wider discussion on fake news. We discuss the foundation as well as recent developments of the technology, as well as the differences from earlier manipulation techniques and investigate technical countermeasures. While the threat of deepfake videos with substantial political impact has been widely discussed in recent years, so far, the political impact of the technology has been limited. We investigate reasons for this and extrapolate the types of deepfake videos we are likely to see in the future.
Multimedia Evaluation Benchmark Workshop 2020, MediaEval 2020, 2020
This paper summarises the results created through participation in the task FakeNews: Corona Viru... more This paper summarises the results created through participation in the task FakeNews: Corona Virus and 5G Conspiracy of the MediaEval Multimedia Evaluation Challenge 2020. The task consists of two parts intending to detect tweets and retweet cascades that emerged during the COVID-19 pandemic and causally connect the radiation of 5G networks with the virus. We applied several well-established neural networks and machine learning techniques for the first subtasks, namely, textual information classification. For the second task, the retweet cascades analysis, we rely on classifiers that work on established graph features, such as the clustering coefficient or graph diameter. Our results show a MCC-score of 0.148 or 0.162 for the NLP task and 0.02 for the structure task. © 2020 Copyright 2020 for this paper by its authors. All Rights Reserved.
We design a system for efficient in-memory analysis of data from the GDELT database of news event... more We design a system for efficient in-memory analysis of data from the GDELT database of news events. The specialization of the system allows us to avoid the inefficiencies of existing alternatives, and make full use of modern parallel high-performance computing hardware. We then present a series of experiments showcasing the system’s ability to analyze correlations in the entire GDELT 2.0 database containing more than a billion news items. The results reveal large scale trends in the world of today’s online news.
Proceedings of the 2021 Workshop on Open Challenges in Online Social Networks, 2021
The COVID-19 pandemic has been accompanied by a flood of misinformation on social media, which ha... more The COVID-19 pandemic has been accompanied by a flood of misinformation on social media, which has been labeled an "infodemic". While a large part of such fake news is ultimately inconsequential, some of it has the potential to real-world harm, but due to the massive amount of social media contents, it is impossible to find this misinformation manually. Thus, conventional fact-checking can typically only counteract misinformation narratives after they have gained significant traction. Only automated systems can provide warnings in advance. However, the automatic detection of misinformation narratives is very challenging since the texts that spread misinformation may be short messages on Twitter. They may also transmit misinformation by implication rather than by stating counterfactual information outright, and satirical messages complicate the issue further. Thus, there is a need for highly sophisticated detection systems. In order to support their development, we created substantial ground truth data by human annotation. In this paper, we present a dataset that deals with a specific piece of misinformation: the idea that the COVID-19 pandemic is causally connected to the 5G wireless network. We selected more than 10,000 tweets that deal with COVID-19 and 5G and labeled them manually, distinguishing between tweets that propagate the specific 5G misinformation, those that spread other conspiracy theories, and tweets that do neither. We provide the human-annotated dataset along with an additional large-scale automatically (by using the humanannotated dataset as the training set) labelled dataset consist of more than 100,000 tweets.
In the wake of the COVID-19 pandemic, a surge of misinformation has flooded social media and othe... more In the wake of the COVID-19 pandemic, a surge of misinformation has flooded social media and other internet channels, and some of it has the potential to cause real-world harm. To counteract this misinformation, reliably identifying it is a principal problem to be solved. However, the identification of misinformation poses a formidable challenge for language processing systems since the texts containing misinformation are short, work with insinuation rather than explicitly stating a false claim, or resemble other postings that deal with the same topic ironically. Accordingly, for the development of better detection systems, it is not only essential to use hand-labeled ground truth data and extend the analysis with methods beyond Natural Language Processing to consider the characteristics of the participant's relationships and the diffusion of misinformation. This paper presents a novel dataset that deals with a specific piece of misinformation: the idea that the 5G wireless netw...