CrisisBench: Benchmarking Crisis-related Social Media Datasets for Humanitarian Information Processing (original) (raw)

Standardizing and Benchmarking Crisis-related Social Media Datasets for Humanitarian Information Processing

arXiv (Cornell University), 2020

Time-critical analysis of social media streams is important for humanitarian organizations to plan rapid response during disasters. The crisis informatics research community has developed several techniques and systems to process and classify big crisis related data posted on social media. However, due to the dispersed nature of the datasets used in the literature, it is not possible to compare the results and measure the progress made towards better models for crisis informatics. In this work, we attempt to bridge this gap by standardizing various existing crisis-related datasets. We consolidate labels of eight annotated data sources and provide 166.1k and 141.5k tweets for informativeness and humanitarian classification tasks, respectively. The consolidation results in a larger dataset that affords the ability to train more sophisticated models. To that end, we provide baseline results using CNN and BERT models. We make the dataset available at https://crisisnlp.qcri.org/crisis\_datasets\_benchmarks.html.

Survey on Identification and Classification of Informative Tweets During Disasters

International Journal for Research in Applied Science & Engineering Technology (IJRASET), 2022

During the time of crisis, people often post countless instructive and informative tweets on various social media platforms like Twitter. Recognizing informative tweets could be a difficult errand during the fiasco from such an enormous pool of tweets. As a solution to the current issue of sorting out enlightening tweets, we present a technique to perceive the distinguishing calamity related informative tweets from the Twitter streams utilizing the textual content. Our objective is to construct a model by using Natural Language Processing(NLP), Exploratory Data Analysis(EDA) and Support Vector Machine(SVM) and Visual Geometry Group (VGG as Deep CNN) to categorize the textual and pictorial content for the tweets. In this kernel we'll explore classification of tweets as disaster or non-disaster, using TensorFlow and Keras. The output of the text-based model is consolidated using the late fusion technique to predict the tweet label.

HumAID: Human-Annotated Disaster Incidents Data from Twitter with Deep Learning Benchmarks

2021

Social networks are widely used for information consumption and dissemination, especially during time-critical events such as natural disasters. Despite its significantly large volume, social media content is often too noisy for direct use in any application. Therefore, it is important to filter, categorize, and concisely summarize the available content to facilitate effective consumption and decision-making. To address such issues automatic classification systems have been developed using supervised modeling approaches, thanks to the earlier efforts on creating labeled datasets. However, existing datasets are limited in different aspects (e.g., size, contains duplicates) and less suitable to support more advanced and data-hungry deep learning models. In this paper, we present a new large-scale dataset with ∼77K human-labeled tweets, sampled from a pool of ∼24 million tweets across 19 disaster events that happened between 2016 and 2019. Moreover, we propose a data collection and sam...

Identifying Disaster-related Tweets: A Large-Scale Detection Model Comparison

ISCRAM 2021 Conference Proceedings – 18th International Conference on Information Systems for Crisis Response and Management, 2021

Social media applications such as Twitter and Facebook are fast becoming a key instrument in gaining situational awareness (understanding the bigger picture of the situation) during disasters. This has provided multiple opportunities to gather relevant information in a timely manner to improve disaster response. In recent years, identifying crisis-related social media posts is analysed as an automatic task using machine learning (ML) or deep learning (DL) techniques. However, such supervised learning algorithms require labelled training data in the early hours of a crisis. Recently, multiple manually labelled disaster-related open-source twitter datasets have been released. In this work, we collected 192, 948 tweets by combining a number of such datasets, preprocessed, filtered and duplicate removed, which resulted in 117, 954 tweets. Then we evaluated the performance of multiple ML and DL algorithms in classifying disaster-related tweets in three settings, namely "in-disaster", "out-disaster" and "cross-disaster". Our results show that the Bidirectional LSTM model with Word2Vec embeddings performs well for the tweet classification task in all three settings. We also make available the preprocessing steps and trained weights for future research.

Classification of multi-modal Natural Disaster Tweets

2020

Social media is the main source of providing information in the form of user generated content(UGC) on the severity of Natural disaster. However, extracting the relevant information in an organized manner from social media has been a challenging task. Therefore, the purpose of this research paper is to provide an efficient algorithm that can reduce the workforce of disaster management by classifying relevant social media streams, humanitarian aid, and damage assessment. The project will classify relevant tweets of hurricane harvey, hurricane irma, california wildfire, mexico earthquake, Nepal earthquake, and iran-iraq earthquake by applying Natural language processing and computer vision based deep learning models ensemble together. The first task will include the categorization of tweets and their respective images on the basis of their relevance(containing information about natural disasters). Secondly, the image data will be categorized further based on humanitarian aid which includes injured or dead people, infrastructure damage, vehicle damage, and missing or found people. Finally, the damage assessment of the events will be categorized based on mild, moderate, and severe. The demo application will be developed to provide a user interface for natural disaster management teams to access disaster.

Beyond Deep Learning: A Two-Stage Approach to Classifying Disaster Events and Needs

2024 International Conference on Information and Communication Technologies for Disaster Management (ICT-DM)

Social media's real-time nature has transformed it into a critical tool for disaster response, and for that this study explores the use of tweets for classifying disaster types and identifying humanitarian needs in the aftermath of various disaster events. We compare traditional machine learning models like Random Forest and Support Vector Machines with the deep learning technique, BERT. While BERT demonstrates promising results, a key finding lies in the performance of the voting classifier ensemble, a combination of traditional models. This ensemble achieves accuracy comparable to BERT and even surpasses it. Furthermore, the ensemble boasts exceptional training and inference speeds, making it ideal for real-time applications in disaster response scenarios. Our work investigates the continued value of traditional machine learning methods. By "dusting off" these models we can achieve competitive performance while maintaining computational efficiency. Ultimately, this study empowers humanitarian organizations to leverage the power of text classification for extracting crucial insights from social media data, leading to more effective and targeted responses in times of crisis.

Crisis Event Extraction Service (CREES) - Automatic Detection and Classification of Crisis-related Content on Social Media

2018

Social media posts tend to provide valuable reports during crises. However, this information can be hidden in large amounts of unrelated documents. Providing tools that automatically identify relevant posts, event types (e.g., hurricane, floods, etc.) and information categories (e.g., reports on affected individuals, donations and volunteering, etc.) in social media posts is vital for their efficient handling and consumption. We introduce the Crisis Event Extraction Service (CREES), an open-source web API that automatically classifies posts during crisis situations. The API provides annotations for crisis-related documents, event types and information categories through an easily deployable and accessible web API that can be integrated into multiple platform and tools. The annotation service is backed by Convolutional Neural Networks (CNNs) and validated against traditional machine learning models. Results show that the CNN-based API results can be relied upon when dealing with spec...

A Method for Classifying Tweets about Emergency Events using Deep Neural Networks

Social media platforms have increasingly become a source of high volume, real-time information describing events and developing information in a timely fashion especially during crisis or emergency. However, the challenges of incorporating social media data into emergency situations response; especially for detecting small scale emergency events where there are only small bits of information, thus complicating the detection of relevant information, remain. To deal with these difficulties, in this paper, we conduct studies on every day small scale incidents and present a method of classifying tweets using convolutional neural networks trained on top of pre-trained word vectors that perform classification in layers in order to (i) determine whether or not the tweet is incident-related; (ii) if a tweet is incident-related, then which category it belongs to. Experiments on a home-grown dataset show that a system using the proposed method and architectures can classify tweetswith an F 1-score of 86.57%.

IRJET- Identification of Informative Tweet during Disasters

IRJET, 2021

During the time of crisis, people often post countless instructive and informative tweets on various social media platforms such as Twitter. Recognizing informative tweets is a difficult errand during the fiasco from such a vast pool of tweets. These tweets can be textual as well as photographic in nature. As an answer to this issue of sorting out enlightening tweets, we present a way to perceive the distinguishing calamity related informative tweets from the Twitter streams utilizing the textual content and the images together. Our objective is to construct a model by combining Bi-directional Long Short-Term Memory (Bi-LSTM) and Convolutional Neural Network (CNN) to categorize the textual content for the tweets. The primarily image-based classification model will utilize the adjusted VGG-16 design to extricate the features from the picture and will classify the image accordingly. The output of the text-based model and the image-based model will be consolidated using the late fusion technique to predict the tweet label.

Verifying baselines for crisis event information classification on Twitter

2020

Social media are rich information sources during and in the aftermath of crisis events such as earthquakes and terrorist attacks. Despite myriad challenges, with the right tools, significant insight can be gained which can assist emergency responders and related applications. However, most extant approaches are incomparable, using bespoke definitions, models, datasets and even evaluation metrics. Furthermore, it is rare that code, trained models, or exhaustive parametrisation details are made openly available. Thus, even confirmation of self-reported performance is problematic; authoritatively determining the state of the art (SOTA) is essentially impossible. Consequently, to begin addressing such endemic ambiguity, this paper seeks to make 3 contributions: 1) the replication and results confirmation of a leading (and generalisable) technique; 2) testing straightforward modifications of the technique likely to improve performance; and 3) the extension of the technique to a novel and...