Vers une plateforme pour l'extraction et la visualisation multi-échelle d'événements sociaux (original) (raw)

Challenges and Applications of Automated Extraction of Socio-political Events from Text (CASE 2022): Workshop and Shared Task Report

arXiv (Cornell University), 2022

We provide a summary of the fifth edition of the CASE workshop that is held in the scope of EMNLP 2022. The workshop consists of regular papers, two keynotes, working papers of shared task participants, and task overview papers. This workshop has been bringing together all aspects of event information collection across technical and social science fields. In addition to the progress in depth, the submission and acceptance of multimodal approaches show the widening of this interdisciplinary research topic.

Rusu et al - Unsupervised Techniques for Extracting and Clustering Complex Events in News(2013)

Structured machine-readable representations of news articles can radically change the way we interact with information. One step towards obtaining these representations is event extraction -the identification of event triggers and arguments in text. With previous approaches mainly focusing on classifying events into a small set of predefined types, we analyze unsupervised techniques for complex event extraction. In addition to extracting event mentions in news articles, we aim at obtaining a more general representation by disambiguating to concepts defined in knowledge bases. These concepts are further used as features in a clustering application. Two evaluation settings highlight the advantages and shortcomings of the proposed approach.

The EUMSSI Project - Event Understanding through Multimodal Social Stream Interpretation

2016

Journalists, as well as users at home, face increasing amounts of data from a large variety of sources, both in professionally curated media archives and in the form of user-generated-content or social media. This provides a great opportunity at the same time as a great challenge to use all of this information, which EUMSSI approaches by providing semantically rich analysis of multimedia content, together with intuitive visual interfaces to explore the data and gain new insights. The goal of the EUMSSI project is to provide a complete system for large-scale multimedia analysis, including a wide range of analysis components working on different media modalities (video, audio, text). Additionally, we have developed two example applications (demonstrators) that build upon this platform with the goal to showcase the platform’s potential, but also to lead towards a commercial exploitation of the project outcomes. The first is a tool to help journalists when writing an article or preparin...

Using Stigmergy to Distinguish Event-Specific Topics in Social Discussions

Sensors

In settings wherein discussion topics are not statically assigned, such as in microblogs, a need exists for identifying and separating topics of a given event. We approach the problem by using a novel type of similarity, calculated between the major terms used in posts. The occurrences of such terms are periodically sampled from the posts stream. The generated temporal series are processed by using marker-based stigmergy, i.e., a biologically-inspired mechanism performing scalar and temporal information aggregation. More precisely, each sample of the series generates a functional structure, called mark, associated with some concentration. The concentrations disperse in a scalar space and evaporate over time. Multiple deposits, when samples are close in terms of instants of time and values, aggregate in a trail and then persist longer than an isolated mark. To measure similarity between time series, the Jaccard's similarity coefficient between trails is calculated. Discussion topics are generated by such similarity measure in a clustering process using Self-Organizing Maps, and are represented via a colored term cloud. Structural parameters are correctly tuned via an adaptation mechanism based on Differential Evolution. Experiments are completed for a real-world scenario, and the resulting similarity is compared with Dynamic Time Warping (DTW) similarity.

Unsupervised Techniques for Extracting and Clustering Complex Events in News

Proceedings of the Second Workshop on EVENTS: Definition, Detection, Coreference, and Representation, 2014

Structured machine-readable representations of news articles can radically change the way we interact with information. One step towards obtaining these representations is event extraction-the identification of event triggers and arguments in text. With previous approaches mainly focusing on classifying events into a small set of predefined types, we analyze unsupervised techniques for complex event extraction. In addition to extracting event mentions in news articles, we aim at obtaining a more general representation by disambiguating to concepts defined in knowledge bases. These concepts are further used as features in a clustering application. Two evaluation settings highlight the advantages and shortcomings of the proposed approach. * The work was carried out while the first author was an intern with Bloomberg Labs.

What Just Happened? A Framework for Social Event Detection and Contextualisation

2015 48th Hawaii International Conference on System Sciences, 2015

In course of a breaking news event, such as natural calamity, political uproar etc., a massive crowd sourced data is generated over social media which makes social media platforms an important source of information in such scenarios. The value of the information being propagated via social media is being increasingly realised by the news organisations and the journalists. Better tools and methodologies are needed to facilitate them in utilising this information for news production. A lot of analysis over social media, by the journalists, is performed via rigorous manual labour. However, the sheer volume of the data produced on social media is overwhelming and acts as a major obstacle for manual inspection of the streaming data for finding, aggregating and contextualising the emerging event in a short time span. This is a day-today challenge for journalists and media organisations. This paper addresses the above problem for journalist in handling the voluminous social media data, viewing it from an information retrieval perspective, by proposing an 'event detection and contextualisation' framework that processes an input stream of social media data into the clusters of likely events.

Event detection and visualization for social text streams

2007

In this paper, we propose to detect events from social text streams by exploring the content as well as the temporal, and social dimensions. We define the term event in the social text streams(e.g., blogs, emails, and Usenets) as a set of relations between social actors on a specific topic over a certain time period. We represent social text streams as multi-graphs, where each node represents a social actor and each edge represents a piece of text communication that connects two actors. The content and temporal associations within each text piece are embedded in the corresponding edge. Then, events are detected by combining text-based clustering, temporal segmentation, and graph cuts of social networks. Moreover, we provide a multi-dimensional visualization tool that visualizes the relations between different events along the three different dimensions. Experiments conducted with the Enron email dataset 1 show the advantages of exploring the social and temporal dimensions along with content, and the usefulness of the visualization tool.

Extracting Large Scale Spatio-Temporal Descriptions from Social Media

2022

The ability to track large-scale events as they happen is essential for understanding them and coordinating reactions in an appropriate and timely manner. This is true, for example, in emergency management and decision-making support, where the constraints on both quality and latency of the extracted information can be stringent. In some contexts, real-time and large-scale sensor data and forecasts may be available. We are exploring the hypothesis that this kind of data can be augmented with the ingestion of semistructured data sources, like social media. Social media can diffuse valuable knowledge, such as direct witness or expert opinions, while their noisy nature makes them not trivial to manage. This knowledge can be used to complement and confirm other spatio-temporal descriptions of events, highlighting previously unseen or undervalued aspects. The critical aspects of this investigation, such as event sensing, multilingualism, selection of visual evidence, and geolocation, are currently being studied as a foundation for a unified spatio-temporal representation of multi-modal descriptions. The paper presents, together with an introduction on the topics, the work done so far on this line of research, also presenting case studies relevant to the posed challenges, focusing on emergencies caused by natural disasters.

SEED: A Framework for Extracting Social Events from Press News

2015

Everyday people are exchanging a huge amount of data through the Internet. Mostly, such data consist of unstruc-tured texts, which often contain references to structured in-formation (e.g., person names, contact records, etc.). In this work, we propose a novel solution to discover social events from actual press news edited by humans. Con-cretely, our method is divided in two steps, each one ad-dressing a specific Information Extraction (IE) task: first, we use a technique to automatically recognize four classes of named-entities from press news: Date, Location, Pla-ce, and Artist. Furthermore, we detect social events by extracting ternary relations between such entities, also ex-ploiting evidence from external sources (i.e., the Web). Fi-nally, we evaluate both stages of our proposed solution on a real-world dataset. Experimental results highlight the qual-ity of our first-step Named-Entity Recognition (NER) ap-proach, which indeed performs consistently with state-of-the-art soluti...

Topic Tomographies (TopTom): a visual approach to distill information from media streams

Computer Graphics Forum, 2019

In this paper we present Top Tom, a digital platform whose goal is to provide analytical and visual solutions for the exploration of a dynamic corpus of user‐generated messages and media articles, with the aim of i) distilling the information from thousands of documents in a low‐dimensional space of explainable topics, ii) cluster them in a hierarchical fashion while allowing to drill down to details and stories as constituents of the topics, iii) spotting trends and anomalies. Top Tom implements a batch processing pipeline able to run both in near‐real time with time stamped data from streaming sources and on historical data with a temporal dimension in a cold start mode. The resulting output unfolds along three main axes: time, volume and semantic similarity (i.e. topic hierarchical aggregation). To allow the browsing of data in a multiscale fashion and the identification of anomalous behaviors, three visual metaphors were adopted from biological and medical fields to design visua...