“Just the facts” with PALOMAR: Detecting protest events in media outlets and Twitter (original) (raw)

A Task Set Proposal for Automatic Protest Information Collection Across Multiple Countries

European Conference on Information Retrieval, 2019

We propose a coherent set of tasks for protest information collection in the context of generalizable natural language processing. The tasks are news article classification, event sentence detection, and event extraction. Having tools for collecting event information from data produced in multiple countries enables comparative sociology and politics studies. We have annotated news articles in English from a source and a target country in order to be able to measure the performance of the tools developed using data from one country on data from a different country. Our preliminary experiments have shown that the performance of the tools developed using English texts from India drops to a level that are not usable when they are applied on English texts from China. We think our setting addresses the challenge of building generalizable NLP tools that perform well independent of the source of the text and will accelerate progress in line of developing generalizable NLP systems.

Cross-Context News Corpus for Protest Event-Related Knowledge Base Construction

Data Intelligence, 2021

We describe a gold standard corpus of protest events that comprise various local and international English language sources from various countries. The corpus contains document-, sentence-, and token-level annotations. This corpus facilitates creating machine learning models that automatically classify news articles and extract protest event-related information, constructing knowledge bases that enable comparative social and political science studies. For each news source, the annotation starts with random samples of news articles and continues with samples drawn using active learning. Each batch of samples is annotated by two social and political scientists, adjudicated by an annotation supervisor, and improved by identifying annotation errors semi-automatically. We found that the corpus possesses the variety and quality that are necessary to develop and benchmark text classification and event extraction systems in a cross-context setting, contributing to the generalizability and robustness of automated text processing systems. This corpus and the reported results will establish a common foundation in automated protest event collection studies, which is currently lacking in the literature.

Challenges and Applications of Automated Extraction of Socio-political Events from Text (CASE 2021): Workshop and Shared Task Report

2021

This workshop is the fourth issue of a series of workshops on automatic extraction of sociopolitical events from news, organized by the Emerging Market Welfare Project, with the support of the Joint Research Centre of the European Commission and with contributions from many other prominent scholars in this field. The purpose of this series of workshops is to foster research and development of reliable, valid, robust, and practical solutions for automatically detecting descriptions of sociopolitical events, such as protests, riots, wars and armed conflicts, in text streams. This year workshop contributors make use of the stateof-the-art NLP technologies, such as Deep Learning, Word Embeddings and Transformers and cover a wide range of topics from text classification to news bias detection. Around 40 teams have registered and 15 teams contributed to three tasks that are i) multilingual protest news detection, ii) fine-grained classification of socio-political events, and iii) discovering Black Lives Matter protest events. The workshop also highlights two keynote and four invited talks about various aspects of creating event data sets and multi-and cross-lingual machine learning in few-and zero-shot settings.

PaloPro: a platform for knowledge extraction from big social data and the news

International Journal of Big Data Intelligence, 2016

PaloPro is a platform that aggregates textual content from social media and news sites in different languages, analyses them using a series of text mining algorithms and provides advanced analytics to journalists and social media marketers. The platform capitalises on the abundance of social media sources and the information they provide for persons, products and events. In order to handle huge amounts of multilingual data that are collected continuously, we have adopted language independent techniques at all levels and from an engineering point of view, we have designed a system that takes advantage of parallel distributed computing technologies and cloud infrastructure. Different systems handle data aggregation, data processing and knowledge extraction and others deal with the integration and visualisation of knowledge. In this paper, we focus on two important text mining tasks, named entity recognition from texts and sentiment analysis to extract the sentiment associated with the corresponding identified entities. N. Makrynioti et al.

Cross-context News Corpus for Protest Events related Knowledge Base Construction

2020

We describe a gold standard corpus of protest events that comprise of various local and international sources from various countries in English. The corpus contains document, sentence, and token level annotations. This corpus facilitates creating machine learning models that automatically classify news articles and extract protest event related information, constructing databases which enable comparative social and political science studies. For each news source, the annotation starts on random samples of news articles and continues with samples that are drawn using active learning. Each batch of samples was annotated by two social and political scientists, adjudicated by an annotation supervisor, and was improved by identifying annotation errors semi-automatically. We found that the corpus has the variety and quality to develop and benchmark text classification and event extraction systems in a cross-context setting, which contributes to generalizability and robustness of automated...

Overview of CLEF 2019 Lab ProtestNews: Extracting Protests from News in a Cross-context Setting

CLEF 2019 | Conference and Labs of the Evaluation Forum, 2019

We present an overview of the CLEF-2019 Lab ProtestNews on Extracting Protests from News in the context of generalizable natural language processing. The lab consists of document, sentence, and token level information classification and extraction tasks that were referred as task 1, task 2, and task 3 respectively in the scope of this lab. The tasks required the participants to identify protest relevant information from English local news at one or more aforementioned levels in a cross-context setting, which is crosscountry in the scope of this lab. The training and development data were collected from India and test data was collected from India and China. The lab attracted 58 teams to participate in the lab. 12 and 9 of these teams submitted results and working notes respectively. We have observed neural networks yield the best results and the performance drops significantly for majority of the submissions in the crosscountry setting, which is China.

Planned Protest Modeling in News and Social Media

Civil unrest (protests, strikes, and "occupy" events) is a common occurrence in both democracies and authoritarian regimes. The study of civil unrest is a key topic for political scientists as it helps capture an important mechanism by which citizenry express themselves. In countries where civil unrest is lawful, qualitative analysis has revealed that more than 75% of the protests are planned, organized, and/or announced in advance; therefore detecting future time mentions in relevant news and social media is a direct way to develop a protest forecasting system. We develop such a system in this paper, using a combination of key phrase learning to identify what to look for, probabilistic soft logic to reason about location occurrences in extracted results, and time normalization to resolve future tense mentions. We illustrate the application of our system to 10 countries in Latin America, viz. Argentina, Brazil, Chile, Colombia, Ecuador, El Salvador, Mexico, Paraguay, Uruguay, and Venezuela. Results demonstrate our successes in capturing significant societal unrest in these countries with an average lead time of 4.08 days. We also study the selective superiorities of news media versus social media (Twitter, Facebook) to identify relevant tradeoffs.

Extractivism: Extracting Activists Events from News Articles Using Existing NLP Tools and Services

2013

Activists have a significant role in shaping social views and opinions. Social scientists study the events activists are involved in order to find out how activists shape our views. Unfortunately, individual sources may present incomplete, incorrect, or biased event descriptions. We present a method where we automatically extract event mentions from different news sources that could complement, contradict, or verify each other. The method makes use of off-the-shelf NLP tools. It is therefore easy to setup and can also be applied to extract events that are not related to activism.

Multilingual Protest News Detection - Shared Task 1, CASE 2021

2021

Benchmarking state-of-the-art text classification and information extraction systems in multilingual, cross-lingual, few-shot, and zero-shot settings for socio-political event information collection is achieved in the scope of the shared task Socio-political and Crisis Events Detection at the workshop CASE @ ACL-IJCNLP 2021. Socio-political event data is utilized for national and international policy- and decision-making. Therefore, the reliability and validity of these datasets are of the utmost importance. We split the shared task into three parts to address the three aspects of data collection (Task 1), fine-grained semantic classification (Task 2), and evaluation (Task 3). Task 1, which is the focus of this report, is on multilingual protest news detection and comprises four subtasks that are document classification (subtask 1), sentence classification (subtask 2), event sentence coreference identification (subtask 3), and event extraction (subtask 4). All subtasks had English, ...

Capturing Planned Protests from Open Source Indicators

AI Magazine

Civil unrest events (protests, strikes, and “occupy” events) are common occurrences in both democracies and authoritarian regimes. The study of civil unrest is a key topic for political scientists as it helps capture an important mechanism by which citizenry express themselves. In countries where civil unrest is lawful, qualitative analysis has revealed that more than 75 percent of the protests are planned, organized, or announced in advance; therefore detecting references to future planned events in relevant news and social media is a direct way to develop a protest forecasting system. We report on a system for doing that in this article. It uses a combination of keyphrase learning to identify what to look for, probabilistic soft logic to reason about location occurrences in extracted results, and time normalization to resolve future time mentions. We illustrate the application of our system to 10 countries in Latin America: Argentina, Brazil, Chile, Colombia, Ecuador, El Salvador,...