Supporting Open-Domain Event Prediction by Using Cross-Domain Twitter Messages (original) (raw)

Timely identification of event start dates from Twitter

We present a method for the identification of future event start dates from Twitter streams. Taking hashtags or event name expressions as query terms, the method gathers a certain number of tweets about an event and uses clues in these tweets to estimate at what date the event will start. Clues include temporal expressions with knowledge-based and automatically generated estimations, and other predictive words. The estimation is performed either with a machine-learning classifier or by taking a majority vote over the temporal expressions found in the set of tweets. Results show that temporal expressions are indeed strong predictors. The majority-based and machine-learning approaches attain equal performances when trained and tested on a single event type, soccer matches, with an average estimation error of 0.05 days; but when tested on a range of different events, the majority-voting approach shows to be more robust than machine learning for this task, yielding high performance on all events. Still, per-event differences hint at a context in which machine learning might be beneficial.

Estimating the Time between Twitter Messages and Future Events

We describe and test three methods to estimate the remaining time between a series of microtexts (tweets) and the future event they refer to via a hashtag. Our system generates hourly forecasts. A linear and a local regression-based approach are applied to map hourly clusters of tweets directly onto time-to-event. To take changes over time into account, we develop a novel time series analysis approach that first derives word frequency time series from sets of tweets and then performs local regression to predict timeto-event from nearest-neighbor time series. We train and test on a single type of event, Dutch premier league football matches. Our results indicate that in an 'early' stage, four days or more before the event, the time series analysis produces time-to-event predictions that are about one day off; closer to the event, local regression attains a similar accuracy. Local regression also outperforms both mean and median-based baselines, but on average none of the tested system has a consistently strong performance through time.

Open-domain extraction of future events from Twitter

Explicit references on Twitter to future events can be leveraged to feed a fully automatic monitoring system of real-world events. We describe a system that extracts open-domain future events from the Twitter stream. It detects future time expressions and entity mentions in tweets, clusters tweets together that overlap in these mentions above certain thresholds, and summarizes these clusters into event descriptions that can be presented to users of the system. Terms for the event description are selected in an unsupervised fashion. 1 We evaluated the system on a month of Dutch tweets, by showing the top-250 ranked events found in this month to human annotators. Eighty per cent of the candidate events were indeed assessed as being an event by at least three out of four human annotators, while all four annotators regarded sixty-three per cent as a real event. An added component to complement event descriptions with additional terms was not assessed better than the original system, due to the occasional addition of redundant terms. Comparing the found events to gold-standard events from maintained calendars on the Web mentioned in at least five tweets, the system yields a recall-at-250 of 0.20 and a recall based on all retrieved events of 0.40.

Supervised Learning Approach for Twitter Credibility Detection

2018 13th International Conference on Computer Engineering and Systems (ICCES), 2018

Twitter is the most popular micro-blogging medium that allows users to exchange short messages, provides a platform for public people to share the news. Nowadays, Twitter counts with an average of 328 million monthly active users and is growing rapidly. Detecting the credibility of shared information on Twitter becomes a necessity, especially during high impact events. In this paper a classification model based on supervised machine learning techniques is proposed to detect credibility. The proposed model uses an extensive set of features including both content-based and source-based features. The research compares the performance of five different machine learning classifiers using three feature sets: content based, source based and a combination of both sets. The best performance is achieved when using a combined set of features and applying Random Forests as a classifier with accuracy 78.4%, precision 79.6%, recall 91.6% and f1-measure 85.2%. Experiments also revealed that the proposed model achieves improvement of 22% when compared to CRF which applies the same approach in terms of F1-measure. Feature analysis is presented to highlight the importance of the source-based features compared with the content-based features as deciders for credibility.

Event detection in Twitter: A machine-learning approach based on term pivoting

The large number of messages on Twitter posted each day provide rich insights into real-world events and public opinion. However, it is difficult to automatically distinguish tweets referring to such events from everyday chatter, and subsequently to distinguish significant events affecting many people from insignificant events. We apply a term-pivot approach to event detection from the Twitter stream. In order to filter out noisy and mundane events, we train a machine learning classifier on several rich features, and rank the events based on classifier confidence. After training and re-training the classifier using manually annotated data, we obtain an F β=1 score of 0.79. However, a baseline that only takes into account the frequency of the tweets that refer to an event yields a better F β=1 score of 0.86. We argue that performance is highly related to the definition of what makes a significant event, and that human understanding of this concept is not uniform.

SENTIMENT ANALYSIS FOR MICRO-BLOGGING PLATFORMS IN ARABIC

Sentiment Analysis (SA) concerns the automatic extraction and classification of sentiments conveyed in a given text, i.e. labelling a text instance as positive, negative or neutral. SA research has attracted increasing interest in the past few years due to its numerous real-world applications. The recent interest in SA is also fuelled by the growing popularity of social media platforms (e.g. Twitter), as they provide large amounts of freely available and highly subjective content that can be readily crawled.

Autonomic Discovery of News Evolvement in Twitter

Studies in Big Data, 2015

Twitter has become a dependable microblogging tool for real time information dissemination and newsworthy events broadcast. Its users sometimes break news on the network faster than traditional newsagents due to their presence at ongoing real life events at most times. Different topic detection methods are currently used to match Twitter posts to real life news of mainstream media. In this paper, we analyse tweets relating to the English FA Cup finals 2012 by applying our novel method named TRCM to extract association rules present in hashtag keywords of tweets in different time-slots. Our system identify evolving hashtag keywords with strong association rules in each time-slot. We then map the identified hashtag keywords to event highlights of the game as reported in the ground truth of the main stream media. The performance effectiveness measure of our experiments show that our method perform well as a Topic Detection and Tracking approach.

Estimating Time to Event from Tweets Using Temporal Expressions

2014

Given a stream of Twitter messages about an event, we investigate the predictive power of temporal expressions in the messages to estimate the time to event (TTE). From labeled training data we learn average TTE estimates of temporal expressions and combinations thereof, and define basic rules to compute the time to event from temporal expressions, so that when they occur in a tweet that mentions an event we can generate a prediction. We show in a case study on soccer matches that our estimations are off by about eight hours on average in terms of mean absolute error.

Extraction of Unexpected Rules from Twitter Hashtags and its Application to Sport Events

2014 13th International Conference on Machine Learning and Applications, 2014

Twitter has become a dependable microblogging tool for real time information dissemination and newsworthy events broadcast. Its users sometimes break news on the network faster than traditional newsagents due to their presence at ongoing real life events at most times. Different topic detection methods are currently used to match Twitter posts to real life news of mainstream media. In this paper, we analyse tweets relating to the English FA Cup finals 2012 by applying our novel method named TRCM to extract association rules present in hashtag keywords of tweets in different time-slots. Our system identify evolving hashtag keywords with strong association rules in each time-slot. We then map the identified hashtag keywords to event highlights of the game as reported in the ground truth of the main stream media. The performance effectiveness measure of our experiments show that our method perform well as a Topic Detection and Tracking approach.

What makes a tweet relevant for a topic?

2012

ABSTRACT Users who rely on microblogging search (MS) engines to find relevant microposts for their queries usually follow their interests and rationale when deciding whether a retrieved post is of interest to them or not. While today's MS engines commonly rely on keyword-based retrieval strategies, we investigate if there exist additional micropost characteristics that are more predictive of a post's relevance and interestingness than its keyword-based similarity with the query.