Catching the Long-Tail: Extracting Local News Events from Twitter (original) (raw)
Related papers
TwitterNews: Real time event detection from the Twitter data stream
Research in event detection from the Twitter streaming data has been gaining momentum in the last couple of years. Although such data is noisy and often contains misleading information, Twitter can be a rich source of information if harnessed properly. In this paper, we propose a scalable event detection system, TwitterNews, to detect and track newsworthy events in real time from Twitter. TwitterNews provides a novel approach, by combining random indexing based term vector model with locality sensitive hashing, that aids in performing incremental clustering of tweets related to various events within a fixed time. TwitterNews also incorporates an effective strategy to deal with the cluster fragmentation issue prevalent in incremental clustering. The set of candidate events generated by TwitterNews are then filtered, to report the newsworthy events along with an automatically selected representative tweet from each event cluster. Finally, we evaluate the effectiveness of TwitterNews, ...
EvenTweet: online localized event detection from twitter
2013
Microblogging services such as Twitter, Facebook, and Four-square have become major sources for information about real-world events. Most approaches that aim at extracting event information from such sources typically use the tem-poral context of messages. However, exploiting the location information of georeferenced messages, too, is important to detect localized events, such as public events or emergency situations. Users posting messages that are close to the lo-cation of an event serve as human sensors to describe an event. In this demonstration, we present a novel framework to detect localized events in real-time from a Twitter stream and to track the evolution of such events over time. For this, spatio-temporal characteristics of keywords are contin-uously extracted to identify meaningful candidates for event descriptions. Then, localized event information is extracted by clustering keywords according to their spatial similar-ity. To determine the most important events in a (r...
Exploring a Scalable Solution to Identifying Events in Noisy Twitter Streams
Proceedings of the 2015 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining 2015, 2015
The unprecedented use of social media through smartphones and other web-enabled mobile devices has enabled the rapid adoption of platforms like Twitter. Event detection has found many applications on the web, including breaking news identification and summarization. The recent increase in the usage of Twitter during crises has attracted researchers to focus on detecting events in tweets. However, current solutions have focused on static Twitter data. The necessity to detect events in a streaming environment during fast paced events such as a crisis presents new opportunities and challenges. In this paper, we investigate event detection in the context of real-time Twitter streams as observed in real-world crises. We highlight the key challenges in this problem: the informal nature of text, and the high-volume and high-velocity characteristics of Twitter streams. We present a novel approach to address these challenges using single-pass clustering and the compression distance to efficiently detect events in Twitter streams. Through experiments on large Twitter datasets, we demonstrate that the proposed framework is able to detect events in near real-time and can scale to large and noisy Twitter streams.
From Tweets to Events: Exploring a Scalable Solution for Twitter Streams
The unprecedented use of social media through smartphones and other web-enabled mobile devices has enabled the rapid adoption of platforms like Twitter. Event detection has found many applications on the web, including breaking news identification and summarization. The recent increase in the usage of Twitter during crises has attracted researchers to focus on detecting events in tweets. However, current solutions have focused on static Twitter data. The necessity to detect events in a streaming environment during fast paced events such as a crisis presents new opportunities and challenges. In this paper, we investigate event detection in the context of real-time Twitter streams as observed in real-world crises. We highlight the key challenges in this problem: the informal nature of text, and the high volume and high velocity characteristics of Twitter streams. We present a novel approach to address these challenges using single-pass clustering and the compression distance to efficiently detect events in Twitter streams. Through experiments on large Twitter datasets, we demonstrate that the proposed framework is able to detect events in near real-time and can scale to large and noisy Twitter streams.
Event identification for local areas using social media streaming data
Proceedings of the ACM SIGMOD Workshop on Databases and Social Networks - DBSocial '13, 2013
Unprecedented success and active usage of social media services result in massive amounts of user-generated data. An increasing interest in the contained information from social media data leads to more and more sophisticated analysis and visualization applications. Because of the fast pace and distribution of news in social media data it is an appropriate source to identify events in the data and directly display their occurrence to analysts or other users. This paper presents a method for event identification in local areas using the Twitter data stream. We implement and use a combined log-likelihood ratio approach for the geographic and time dimension of real-life Twitter data in predefined areas of the world to detect events occurring in the message contents. We present a case study with two interesting scenarios to show the usefulness of our approach.
Extraction and Compilation of Events and Sub-events from Twitter
2012 IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology, 2012
Twitter has emerged as a great source to provide insights about upcoming planned and unplanned events of social, economic and political relevance. Big events are publicized and known in advance, but smaller, unplanned sub-events around them are not always advertised. These unplanned events may have a large localized impact. If known in advance, knowledge about events like threats, protests, demonstrations etc. or even about large flash mobs can be utilized by planners and event managers. Given the large volumes of tweets floating around at any given time, identifying relevant sub-events is a non-trivial task. In this paper, we explore machine learning techniques to identify, extract and build a map of small sub-events around a big, popular event. We use CRFs to extract event components from tweets. Events are resolved for uniqueness and compiled into a complete calendar. The model is evaluated on tweets around Olympic Games. The framework is generic enough to be adapted to other domains.
Localized Events in Social Media Streams: Detection, Tracking, and Recommendation
2016
From the recent proliferation of social media channels to the immense amount of user-generated content, an increasing interest in social media mining is currently being witnessed. Messages continuously posted via these channels report a broad range of topics from daily life to global and local events. As a consequence, this has opened new opportunities for mining event information crucial in many application domains, especially in increasing the situational awareness in critical scenarios. Interestingly, many of these messages are enriched with location information, due to the widespread of mobile devices and the recent advancements of today's location acquisition techniques. This enables location-aware event mining, i.e., the detection and tracking of localized events. In this thesis, we propose novel frameworks and models that digest social media content for localized event detection, tracking, and recommendation. We first develop KeyPicker, a framework to extract and score event-related keywords in an online fashion, accounting for high levels of noise, temporal heterogeneity and outliers in the data. Then, LocEvent is proposed to incrementally detect and track events using a 4-stage procedure. That is, LocEvent receives the keywords extracted by KeyPicker, identifies local keywords, spatially clusters them, and finally scores the generated clusters. For each detected event, a set of descriptive keywords, a location, and a time interval are estimated at a fine-grained resolution. In addition to the sparsity of geo-tagged messages, people sometimes post about events far away from an event's location. Such spatial problems are handled by novel spatial regularization techniques, namely, graph-and gazetteer-based regularization. To ensure scalability, we utilize a hierarchical spatial index in addition to a multi-stage filtering procedure that gradually suppresses noisy words and considers only event-related ones for complex spatial computations. Undertaking this PhD has been a truly life-changing experience for me. This work would not have been possible without the support that I received from many people. First of all, I would like to express my deepest gratitude to my advisor, Prof. Dr. Michael Gertz, for giving me the opportunity to be one of his PhD students, it is truly an honor. I am really thankful for his guidance, endless support, immense knowledge, and deep insights that helped me at various stages of my research. I remain amazed that despite his busy schedule, he was able to go through the final draft of my thesis and meet me regularly with comments and suggestions on almost every page. I would also like to thank my dissertation committee for the time, efforts, and precious feedback. During my PhD study and in spite of my busy days, I had a memorable time at the Database Systems Research Group, Heidelberg University. I was lucky to have an impressive research environment with great colleagues. Thank you, Dr. Ayser Armiti, you supported my first step to join this wonderful group. Many thanks to
What’s Happening Around the World? A Survey and Framework on Event Detection Techniques on Twitter
Journal of Grid Computing
In the last few years, Twitter has become a popular platform for sharing opinions, experiences, news, and views in real-time. Twitter presents an interesting opportunity for detecting events happening around the world. The content (tweets) published on Twitter are short and pose diverse challenges for detecting and interpreting event-related information. This article provides insights into ongoing research and helps in understanding recent research trends and techniques used for event detection using Twitter data. We classify techniques and methodologies according to event types, orientation of content, event detection tasks, their evaluation, and common practices. We highlight the limitations of existing techniques and accordingly propose solutions to address the shortcomings. We propose a framework called EDoT based on the research trends, common practices, and techniques used for detecting events on Twitter. EDoT can serve as a guideline for developing event detection methods, especially for researchers who are new in this area. We also describe and compare data collection techniques, the effectiveness and shortcomings of various Twitter and non-Twitter-based features, and discuss various evaluation measures and benchmarking methodologies. Finally, we discuss the trends, limitations, and future directions for detecting events on Twitter.
Event Detection in Twitter by Weighting Tweet's Features
2020
In recent years, people spend a lot of time on social networks. They use social networks as a place to comment on personal or public events. Thus, a large amount of information is generated and shared daily in these networks. Using such a massive amount of information can help authorities to react to events accurately and timely. In this study, the social network investigated is Twitter. The main idea of this research is to differentiate among tweets based on some of their features. This study aimed at investigating the performance of event detection by weighting three attributes of tweets; including the followers count, the retweets count, and the user location. The results show that the average execution time and the precision of event detection in the presented method improved 27% and 31%, respectively, than the base method. Another result of this research is the ability to detect all events (including hot events and less important ones) in the presented method.
Tweets analysis for event detection
Ingénierie des systèmes d'information, 2016
Social media systems have been proven to be valuable platforms for information and communication, particularly during events; in case of natural disaster like earthquakes tsunami and states of nuclear emergencies in Japan in 2011. The behavior leads to an accumulation of an enormous amount of information. However, finding relevant posts can be a challenging task, since the relevance of a post is dependent both on its content, author and tweet's characteristics. Besides identifying tweets that describe a specific type of event is also challenging due to the high complexity and variety of event descriptions. These challenges present a big opportunity for Natural Language Processing (NLP) and Information Extraction (IE) technology to enable new large-scale data-analysis applications. Taking to account all the difficulties, this paper proposes a new metric to improve the results of the searches in microblogs. It combines content relevance, tweet relevance and author relevance, and develops a Natural Language Processing method for extracting temporal information of events from posts more specifically tweets. Our approach is based on a methodology of temporal markers classes and on a contextual exploration method. To evaluate our model, we built a knowledge management system. Actually, we used a collection of 10 thousand of tweets talking about the current events in 2014 and 2015.