MEAD - a platform for multidocument multilingual text summarization (original) (raw)

G. Giannakopoulos, , M., B. Favre, M. Litvak, J. Steinberger, and V. Varma. ”TAC 2011 MultiLing Pilot Overview. In Text Analysis Conference (TAC) 2011, MultiLing Summarisation Pilot, Maryland, USA, 2011. TAC 2011

The Text Analysis Conference MultiLing Pilot of 2011 posed a multi-lingual summarization task to the summarization community, aiming to quantify and measure the performance of multi-lingual, multi-document summarization systems. The task was to create a 240-250 word summary from 10 news texts, describing a given topic. The texts of each topic were provided in seven languages (Arabic, Czech, English, French, Greek, Hebrew, Hindi) and each participant generated summaries for at least 2 languages. The evaluation of the summaries was performed using automatic (Au-toSummENG, Rouge) and manual processes (Overall Responsiveness score). The participating systems were 8, some of which providing summaries across all languages. This paper provides a brief description for the collection of the data, the evaluation methodology, the problems and challenges faced, and an overview of participation and corresponding results.

Developing Infrastructure for the Evaluation of Single and Multi-document Summarization Systems in a Cross-lingual Environment

2002

We describe our work on the development of Language and Evaluation Resources for the evaluation of summaries in English and Chinese. The language resources include a parallel corpus of English and Chinese texts which are translations of each other, a set of queries in both languages, clusters of documents relevants to each query, sentence relevance measures for each sentence in the document clusters, and manual multi-document summaries at different compression rates. The evaluation resources consist of metrics for measuring the content of automatic summaries against reference summaries. The framework can be used in the evaluation of extractive, non-extractive, single and multi-document summarization. We focus on the resources developed that are made available for the research community.

A Robust and Adaptable Summarization Tool

2008

Over the last few years there has been substantial research on text summarization, but comparatively little research has been carried out on adaptable components that allow rapid development and evaluation of summarization solutions. This paper presents a set of adaptable summarization components together with well-established evaluation tools, all within the GATE paradigm. The toolkit includes resources for the computation of summarization features which are combined in order to provide functionalities for single-document, multi-document, query- based, and multi/cross-lingual summarization. The summarization tools have been successfully used in a number of applications including a fully-fledged information access system. RÉSUMÉ. Au cours des dernières années il y a eu un nombre important de recherches au su- jet du résumé automatique. Toutefois, il y a eu comparativement peu de recherche au sujet des ressources computationnelles et composantes qui peuvent être adaptées facilement p...

Portable Text Summarization

Identification, Investigation and Resolution, 2011

Today, with digitally stored information available in abundance, even for many less commonly spoken languages, this information must by some means be filtered and extracted in order to avoid drowning in it. Automatic summarization is one such technique, where a computer summarizes a longer text into a shorter non-redundant form. The development of advanced summarization systems also for smaller languages may unfortunately prove too costly. Nevertheless, there will still be a need for summarization tools for these languages in order to curb the immense flow of digital information. This chapter sets the focus on automatic summarization of text using as few direct human resources as possible, resulting in what can be perceived as an intermediary system. Furthermore, it presents the notion of taking a holistic view of the generation of summaries.

Multi-document multilingual summarization corpus preparation, part 1: Arabic, english, greek, chinese, romanian

This document overviews the strategy, effort and aftermath of the MultiLing 2013 multilingual summarization data collection. We describe how the Data Contributors of MultiLing collected and generated a multilingual multi-document summarization corpus on 10 different languages: Arabic, Chinese, Czech, English, French, Greek, Hebrew, Hindi, Romanian and Spanish. We discuss the rationale behind the main decisions of the collection, the methodology used to generate the multilingual corpus, as well as challenges and problems faced per language. This paper overviews the work on Czech, Hebrew and Spanish languages.

SUMMA A Robust and Adaptable Summarization Tool

Traitement Automatique des Langues, 2008

Over the last few years there has been substantial research on text summarization, but comparatively little research has been carried out on adaptable components that allow rapid development and evaluation of summarization solutions. This paper presents a set of adaptable summarization components together with well-established evaluation tools, all within the GATE paradigm. The toolkit includes resources for the computation of summarization features which are combined in order to provide functionalities for single-document, multi-document, query-based, and multi/cross-lingual summarization. The summarization tools have been successfully used in a number of applications including a fully-fledged information access system. RÉSUMÉ. Au cours des dernières années il y a eu un nombre important de recherches au su-jet du résumé automatique. Toutefois, il y a eu comparativement peu de recherche au sujet des ressources computationnelles et composantes qui peuvent être adaptées facilement pour le développement et l'évaluation des systèmes de résumé automatique. Ici on présente un ensemble de ressources spécifiquement développées pour le résumé automatique qui se basent sur la plateforme GATE. Les composantes sont utilisées pour calculer des traits indiquant la perti-nence des phrases. Ces composantes sont combinées pour produire différents types de systèmes de résumé tels que résumé de document simple, résumé de document multiple, et résumé basé sur des topiques. Les ressources et algorithmes implémentés ont été utilisés pour développer plusieurs applications pour l'accès à l'information dans des systèmes d'information.

EASY-M: Evaluation System for Multilingual Summarizers

Proceedings of the Workshop MultiLing 2019: Summarization Across Languages, Genres and Sources associated with RANLP 2019, 2019

Automatic text summarization aims at producing a shorter version of a document (or a document set). Evaluation of summarization quality is a challenging task. Because human evaluations are expensive and evaluators often disagree between themselves, many researchers prefer to evaluate their systems automatically, with help of software tools. Such a tool usually requires a point of reference in the form of one or more human-written summaries for each text in the corpus. Then, a systemgenerated summary is compared to one or more human-written summaries, according to selected measures (also called metrics). However, a single metric cannot reflect all quality-related aspects of a summary. In this paper we present the EvAluation SYstem for Multilingual Summarization (EASY-M), which enables the evaluation of system-generated summaries in 23 languages with several quality measures, based on comparison with their human-generated counterparts. The system also provides comparative results with two built-in baselines. The EASY-M system is freely available for the NLP community 1 .