Multi-Document Summarization Research Papers - Academia.edu (original) (raw)

We present and compare two approaches to the task of summarizing evaluative ar- guments. The first is a sentence extraction- based approach while the second is a lan- guage generation-based approach. We evaluate these approaches in a user... more

We present and compare two approaches to the task of summarizing evaluative ar- guments. The first is a sentence extraction- based approach while the second is a lan- guage generation-based approach. We evaluate these approaches in a user study and find that they quantitatively perform equally well. Qualitatively, however, we find that they perform well for different but complementary reasons.

The recent construction of the new L9 subway line in Barcelona, Spain has provided the opportunity to study the impact of different antenna configurations on the maximum channel capacity inside subway tunnels. In this work the authors... more

The recent construction of the new L9 subway line in Barcelona, Spain has provided the opportunity to study the impact of different antenna configurations on the maximum channel capacity inside subway tunnels. In this work the authors present the design tradeoffs inside different kind of tunnels in terms of antenna spacing and applied diversity technique for a 2x2 MIMO system at C-Band. These design tradeoffs are the conclusion of the measurement campaign carried out during last year at L9 subway tunnels.

The availability of online information shows a need of efficient text summarization system. The text summarization system follows extractive and abstractive methods. In extractive summarization, the important sentences are selected from... more

The availability of online information shows a need of efficient text summarization system. The text summarization system follows extractive and abstractive methods. In extractive summarization, the important sentences are selected from the original text on the basis of sentence ranking methods. The Abstractive summarization system understands the main concept of texts and predicts the overall idea about the topic. This paper mainly concentrated the survey of existing extractive text summarization models. Numerous algorithms are studied and their evaluations are explained. The main purpose is to observe the peculiarities of existing extractive summarization models and to find a good approach that helps to build a new text summarization system.

Recent advances in massive information storage and growth of internet have led to increased necessity of tools such as search engines and QA systems to get meaningful answers from the vast amount of information. The complex questions... more

Recent advances in massive information storage and growth of internet have led to increased necessity of tools such as search engines and QA systems to get meaningful answers from the vast amount of information. The complex questions appearing in real-life generally require multiple techniques or sub-division of question at various levels. Several practical systems have been developed to explore this. This paper reviews state-of-art QA systems which provide enhanced and versatile functionality at various levels in their architecture and then makes the following contributions: 1) it provides useful overview of research trends and recent developments in the area of QA; 2) the paper introduces and defines basic design parameters for any QA system; 3) it resolves semantic heterogeneity by clarifying the meaning and context in which different terms are used by different researchers; 4) it classifies the existing QA systems in broad groups, based on the point at which versatility (multiplicity) is introduced/handled in the system; 5) by developing a unified view of existing frontiers of QA systems, it provides clear directions for future research and development in the area.

Focused Multi-Document Summarization (MDS) is concerned with summarizing documents in a collection with a concentration toward a particular external request (ie query, question, topic, etc.), or focus. Although the current... more

Focused Multi-Document Summarization (MDS) is concerned with summarizing documents in a collection with a concentration toward a particular external request (ie query, question, topic, etc.), or focus. Although the current state-of-the-art provides somewhat ...

We present an automatic multi-document summarization system for Dutch based on the MEAD system. We focus on redundancy detection, an essential ingredient of multi-document summarization. We introduce a semantic overlap detection tool,... more

We present an automatic multi-document summarization system for Dutch based on the MEAD system. We focus on redundancy detection, an essential ingredient of multi-document summarization. We introduce a semantic overlap detection tool, which goes beyond simple string matching. Our results so far do not confirm our expectation that this tool would outperform the other tested methods.

This paper proposes an optimization-based model for generic document summarization. The model generates a summary by extracting salient sentences from documents. This approach uses the sentence-to-document collection, the... more

This paper proposes an optimization-based model for generic document summarization. The model generates a summary by extracting salient sentences from documents. This approach uses the sentence-to-document collection, the summary-to-document collection and the sentence-to-sentence relations to select salient sentences from given document collection and reduce redundancy in the summary. To solve the optimization problem has been created an improved differential evolution algorithm. The algorithm can adjust crossover rate adaptively according to the fitness of individuals.

Multi-Document summarization strictly needs distinguishing the similarity between sentences & paragraphs of texts because repeated sentences shouldn't exist in final summary so in order to applying this anti-redundancy we need a mechanism... more

Multi-Document summarization strictly needs distinguishing the similarity between sentences & paragraphs of texts because repeated sentences shouldn't exist in final summary so in order to applying this anti-redundancy we need a mechanism that can determining semantic similarities between sentences and expressions and paragraphs and finally between texts. In this paper it's used a fuzzy approach to determining this semantic similarity. We use fuzzy similarity and fuzzy approximation relation for gaining this goal. At first , lemma of Persian words and verbs obtained and then synonyms create a fuzzy similarity relation and via that relation the sentences with near meaning calculated with help of fuzzy proximity relation. So we can produce an anti-redundant final summary that have more valuable information.

Current multi-document summarization systems can successfully extract summary sentences, however with many limitations including: low coverage, inaccurate extraction to important sentences, redundancy and poor coherence among the selected... more

Current multi-document summarization systems can successfully extract summary sentences, however with
many limitations including: low coverage, inaccurate extraction to important sentences, redundancy and poor coherence among the selected sentences. The present study introduces a new concept of centroid approach and reports new techniques for extracting summary sentences for multi-document. In both
techniques keyphrases are used to weigh sentences and documents. The first summarization technique (Sen-Rich) prefers maximum richness sentences. While the second (Doc-Rich), prefers sentences from centroid document. To demonstrate the new summarization system application to extract summaries of Arabic documents we performed two experiments. First, we applied Rouge measure to compare the new techniques among systems presented at TAC2011. The results show that Sen-Rich outperformed all systems in ROUGE-S. Second, the system was applied to summarize multi-topic documents. Using human evaluators, the results show that Doc-Rich is the superior, where summary sentences characterized by extra coverage and more cohesion.

The issue of sentence ordering is an important one for natural language tasks such as multi-document summarization, yet there has not been a quantitative exploration of the range of acceptable sentence orderings for short texts. We... more

The issue of sentence ordering is an important one for natural language tasks such as multi-document summarization, yet there has not been a quantitative exploration of the range of acceptable sentence orderings for short texts. We present results of a sentence reordering experiment with three experimental conditions. Our findings indicate a very high degree of variability in the orderings that the eighteen subjects produce. In addition, the variability of reorderings is significantly greater when the initial ordering seen by subjects ...

We present a large-scale meta evaluation of eight evaluation measures for both single-document and multi-document summarizers. To this end we built a corpus consisting of (a) 100 Million automatic summaries using six summarizers and... more

We present a large-scale meta evaluation of eight evaluation measures for both single-document and multi-document summarizers. To this end we built a corpus consisting of (a) 100 Million automatic summaries using six summarizers and baselines at ten summary lengths in both English and Chinese, (b) more than 10,000 manual abstracts and extracts, and (c) 200 Million automatic document and summary retrievals using 20 queries. We present both qualitative and quantitative results showing the strengths and drawbacks of all evaluation methods and how they rank the different summarizers.

Smnmarization of multiple documents featuringmultiple topics is discnssed. The exampletrea,tcd here consists of fifty a,rticlcs aboutthe Peru hostage incident ibr December 1996throngh April ].997. They inclndc a lot of topicssuch a.s... more

Smnmarization of multiple documents featuringmultiple topics is discnssed. The exampletrea,tcd here consists of fifty a,rticlcs aboutthe Peru hostage incident ibr December 1996throngh April ].997. They inclndc a lot of topicssuch a.s opening, negotiation, ending, andso on. The method proposed in this paper isbased on spreading a.ctivation over documentssyntactically and semantically annotated withGDA (Global l)ocmncnt Annotation) tags. Themethod extracts important docmnents

The success of information systems relies on their capabilities to provide users with the relevant information they need, when they need it. Meeting these requirements will result in improving response times and decision-making processes.... more

The success of information systems relies on their capabilities to provide users with the relevant information they need, when they need it. Meeting these requirements will result in improving response times and decision-making processes. Geographic information system (GIS) is one instance of such systems where data are usually relative to a specific application. To extend this dataset, we propose an approach to enrich the geographical database (GDB). This enrichment is carried out by adding knowledge extracted from web documents to the descriptive data of the GDB. The knowledge extraction process is performed by generating summaries from a corpus of on-line documents. This summarization is done in a distributed fashion by using a set of cooperating agents.

We implemented an initial application of a sentence-trimming approach (Trimmer) to the problem of multi-document summarization in the MSE2005 and DUC2005 tasks. Sentence trimming was incorporated into a feature-based summarization system,... more

We implemented an initial application of a sentence-trimming approach (Trimmer) to the problem of multi-document summarization in the MSE2005 and DUC2005 tasks. Sentence trimming was incorporated into a feature-based summarization system, called Multi-Document Trimmer (MDT), by using sentence trimming as both a preprocessing stage and a feature for sentence ranking. We demonstrate that we were able to port Trimmer easily to this new problem. Although the direct impact of sentence trimming was minimal compared to other features used in the system, the interaction of the other features resulted in trimmed sentences accounting for nearly half of the selected summary sentences.

Clusters of multiple news stories related to the same topic exhibit a number of interesting properties. For example, when documents have been published at various points in time or by different authors or news agencies, one finds many... more

Clusters of multiple news stories related to the same topic exhibit a number of interesting properties. For example, when documents have been published at various points in time or by different authors or news agencies, one finds many instances of paraphrasing, information overlap and even contradiction. The current paper presents the Cross-document Structure Theory (CST) Bank, a collection of multi-document clusters in which pairs of sentences from different documents have been annotated for cross-document structure theory relationships. We will describe how we built the corpus, including our method for reducing the number of sentence pairs to be annotated by our hired judges, using lexical similarity measures. Finally, we will describe how CST and the CST Bank can be applied to different research areas such as multi-document summarization.

This paper gives an overview of a project to generate literature reviews from a set of research papers, based on techniques drawn from human summarization behavior. For this study, we identify the key features of natural literature... more

This paper gives an overview of a project to generate literature reviews from a set of research papers, based on techniques drawn from human summarization behavior. For this study, we identify the key features of natural literature reviews through a macro-level and clause-level discourse analysis; we also identify human information selection strategies by mapping referenced information to source documents. Our preliminary results of discourse analysis have helped us characterize literature review writing styles based on their document structure and rhetorical structure. These findings will be exploited to design templates for automatic content generation.

The massive quantity of data available today in the Internet has reached such a huge volume that it has become humanly unfeasible to efficiently sieve useful information from it. One solution to this problem is offered by using text... more

The massive quantity of data available today in the Internet has reached such a huge volume that it has become humanly unfeasible to efficiently sieve useful information from it. One solution to this problem is offered by using text summarization techniques. Text summarization, the process of automatically creating a shorter version of one or more text documents, is an important way of finding relevant information in large text libraries or in the Internet. This paper presents a multi-document summarization system that concisely extracts the main aspects of a set of documents, trying to avoid the typical problems of this type of summarization: information redundancy and diversity. Such a purpose is achieved through a new sentence clustering algorithm based on a graph model that makes use of statistic similarities and linguistic treatment. The DUC 2002 dataset was used to assess the performance of the proposed system, surpassing DUC competitors by a 50% margin of f-measure, in the best case.

Ordering information is a difficult but important task for applications generating naturallanguage texts such as multi-document summarization, question answering, and conceptto-text generation. In multi-document summarization, information... more

Ordering information is a difficult but important task for applications generating naturallanguage texts such as multi-document summarization, question answering, and conceptto-text generation. In multi-document summarization, information is selected from a set of source documents. However, improper ordering of information in a summary can confuse the reader and deteriorate the readability of the summary. Therefore, it is vital to properly order the information in multi-document summarization. We present a bottom-up approach to arrange sentences extracted for multi-document summarization. To capture the association and order of two textual segments (e.g. sentences), we define four criteria: chronology, topical-closeness, precedence, and succession. These criteria are integrated into a criterion by a supervised learning approach. We repeatedly concatenate two textual segments into one segment based on the criterion, until we obtain the overall segment with all sentences arranged. We evaluate the sentence orderings produced by the proposed method and numerous baselines using subjective gradings as well as automatic evaluation measures. We introduce the average continuity, an automatic evaluation measure of sentence ordering in a summary, and investigate its appropriateness for this task.

We introduce a new kind of patterns, called emerging patterns (EPs), for knowledge discovery from databases. EPs are de ned as itemsets whose supports increase signi cantly from one dataset to another. EPs can capture emerging trends in... more

We introduce a new kind of patterns, called emerging patterns (EPs), for knowledge discovery from databases. EPs are de ned as itemsets whose supports increase signi cantly from one dataset to another. EPs can capture emerging trends in timestamped databases, or useful contrasts between data classes. EPs have been proven useful: we have used them to build very powerful classi ers, which are more accurate than C4.5 and CBA, for many datasets. We believe that EPs with low to medium support, such as 1%{ 20%, can give useful new insights and guidance to experts, in even \well understood" applications.

We describe our work on the development of Language and Evaluation Resources for the evaluation of summaries in English and Chinese. The language resources include a parallel corpus of English and Chinese texts which are translations of... more

We describe our work on the development of Language and Evaluation Resources for the evaluation of summaries in English and Chinese. The language resources include a parallel corpus of English and Chinese texts which are translations of each other, a set of queries in both languages, clusters of documents relevants to each query, sentence relevance measures for each sentence in

Dans cet article, nous présentons des applications du système Enertex au Traitement Automatique de la Langue Naturelle. Enertex est basé sur l'énergie textuelle, une approche par réseaux de neurones inspirée de la physique statistique des... more

Dans cet article, nous présentons des applications du système Enertex au Traitement Automatique de la Langue Naturelle. Enertex est basé sur l'énergie textuelle, une approche par réseaux de neurones inspirée de la physique statistique des systèmes magnétiques. Nous avons appliqué cette approche aux problèmes du résumé automatique multi-documents et de la détection de frontières thématiques. Les résultats, en trois langues : anglais, espagnol et français, sont très encourageants.

The technology of automatic document summarization is maturing and may provide a solution to the information overload problem. Nowadays, document summarization plays an important role in information retrieval. With a large volume of... more

The technology of automatic document summarization is maturing and may provide a solution to the information overload problem. Nowadays, document summarization plays an important role in information retrieval. With a large volume of documents, presenting the user with a ...

Multi-document summarization is the automatic extraction of information from multiple documents of the same topic. This paper proposes a new method, using LSA, for extracting the global context of a topic and removes sentence redundancy... more

Multi-document summarization is the automatic extraction of information from multiple documents of the same topic. This paper proposes a new method, using LSA, for extracting the global context of a topic and removes sentence redundancy using SRL and WordNet semantic similarity for Persian language. In the previous approaches, the focus was on the sentence features (local view) as the main and basic unit of text. In this paper, the sentences are selected based on the main context hidden in the all documents of a topic. The experimental results show that our proposed method outperforms other Persian multidocument systems.

The issue of sentence ordering is an important one for natural language tasks such as multi-document summarization, yet there has not been a quantitative exploration of the range of acceptable sentence orderings for short texts. We... more

The issue of sentence ordering is an important one for natural language tasks such as multi-document summarization, yet there has not been a quantitative exploration of the range of acceptable sentence orderings for short texts. We present results of a sentence reordering experiment with three experimental conditions. Our findings indicate a very high degree of variability in the orderings that the eighteen subjects produce. In addition, the variability of reorderings is significantly greater when the initial ordering seen by subjects ...

Capturing knowledge from free-form evaluative texts about an entity is a challenging task. New techniques of feature extraction, polarity determination and strength evaluation have been proposed. Feature extraction is particularly... more

Capturing knowledge from free-form evaluative texts about an entity is a challenging task. New techniques of feature extraction, polarity determination and strength evaluation have been proposed. Feature extraction is particularly important to the task as it provides the underpinnings of the extracted knowledge. The work in this paper introduces an improved method for feature extraction that draws on an existing unsupervised method. By including user-specific prior knowledge of the evaluated entity, we turn the task of feature extraction into one of term similarity by mapping crude (learned) features into a user-defined taxonomy of the entity's features. Results show promise both in terms of the accuracy of the mapping as well as the reduction in the semantic redundancy of crude features.

For effective multi-document summarization, it is important to reduce redundant information in the summaries and extract sentences, which are common to given documents. This paper presents a document summarization model which extracts key... more

For effective multi-document summarization, it is important to reduce redundant information in the summaries and extract sentences, which are common to given documents. This paper presents a document summarization model which extracts key sentences from given documents while reducing redundant information in the summaries. An innovative aspect of our model lies in its ability to remove redundancy while selecting representative sentences. The model is represented as a discrete optimization problem. To solve the discrete optimization problem in this study an adaptive DE algorithm is created. We implemented our model on multi-document summarization task. Experiments have shown that the proposed model is to be preferred over summarization systems. We also showed that the resulting summarization system based on the proposed optimization approach is competitive on the DUC2002 and DUC2004 datasets.

This paper describes a framework for multidocument summarization which combines three premises: coherent themes can be identified reliably; highly representative themes, running across subsets of the document collection, can function as... more

This paper describes a framework for multidocument summarization which combines three premises: coherent themes can be identified reliably; highly representative themes, running across subsets of the document collection, can function as multi-document summary surrogates; and effective end-use of such themes should be facilitated by a visualization environment which clarifies the relationship between themes and documents. We present algorithms that formalize our framework, describe an implementation, and demonstrate a prototype system and interface.

Scoring sentences in documents given abstract summaries created by humans is important in extractive multi-document summarization. In this paper, we formulate extractive summarization as a two step learning problem building a generative... more

Scoring sentences in documents given abstract summaries created by humans is important in extractive multi-document summarization. In this paper, we formulate extractive summarization as a two step learning problem building a generative model for pattern discovery and a regression model for inference. We calculate scores for sentences in document clusters based on their latent characteristics using a hierarchical topic model. Then, using these scores, we train a regression model based on the lexical and structural characteristics of the sentences, and use the model to score sentences of new documents to form a summary. Our system advances current state-of-the-art improving ROUGE scores by ∼7%. Generated summaries are less redundant and more coherent based upon manual quality evaluations.

Reviews about products and services are abundantly available online. However, selecting information relevant to a potential buyer involves a significant amount of time reading user's reviews and weeding out comments unrelated to the... more

Reviews about products and services are abundantly available online. However, selecting information relevant to a potential buyer involves a significant amount of time reading user's reviews and weeding out comments unrelated to the important aspects of the reviewed entity. In this work, we present STARLET, a novel approach to multi-document summarization for evaluative text that considers the rating distribution as summarization feature to consistently preserve the overall opinion distribution expressed in the original reviews. We demonstrate how this method improves traditional summarization techniques and leads to more readable summaries.

The production of accurate and complete multiple-document summaries is challenged by the complexity of judging the usefulness of information to the user. Our aim is to determine whether identifying sub-events in a news topic could help us... more

The production of accurate and complete multiple-document summaries is challenged by the complexity of judging the usefulness of information to the user. Our aim is to determine whether identifying sub-events in a news topic could help us capture essential information to produce better summaries. In our first experiment, we asked human judges to determine the relative utility of sentences as they related to the subevents of a larger topic. We used this data to create summaries by three different methods, and we then compared these summaries with three automatically created summaries. In our second experiment, we show how the results of our first experiment can be applied to a cluster-based automatic summarization system. Through both experiments, we examine the use of inter-judge agreement and a relative utility metric that accounts for the complexity of determining sentence quality in relation to a topic.

This paper describes the results of a study designed to assess human expert ratings of educational concept features for use in automatic core concept extraction systems. Digital library resources provided the content base for human... more

This paper describes the results of a study designed to assess human expert ratings of educational concept features for use in automatic core concept extraction systems. Digital library resources provided the content base for human experts to annotate automatically extracted concepts on seven dimensions: coreness, local importance, topic, content, phrasing, structure, and function. The annotated concepts were used as training data to build a machine learning classifier as part of a tool used to predict the core concepts in the document. These predictions were compared with the experts' judgment of concept coreness.

We present LQVSumm, a corpus of about 2000 automatically created extractive multi-document summaries from the TAC 2011 shared task on Guided Summarization, which we annotated with several types of linguistic quality violations. Examples... more

We present LQVSumm, a corpus of about 2000 automatically created extractive multi-document summaries from the TAC 2011 shared task on Guided Summarization, which we annotated with several types of linguistic quality violations. Examples for such violations include pronouns that lack antecedents or ungrammatical clauses. We give details on the annotation scheme and show that inter-annotator agreement is good given the open-ended nature of the task. The annotated summaries have previously been scored for Readability on a numeric scale by human annotators in the context of the TAC challenge; we show that the number of instances of violations of linguistic quality of a summary correlates with these intuitively assigned numeric scores. On a system-level, the average number of violations marked in a system's summaries achieves higher correlation with the Readability scores than current supervised state-of-the-art methods for assigning a single readability score to a summary. It is our hope that our corpus facilitates the development of methods that not only judge the linguistic quality of automatically generated summaries as a whole, but which also allow for detecting, labeling, and fixing particular violations in a text.

We introduce the novel problem of automatic related work summarization. Given multiple articles (eg, conference/journal papers) as input, a related work summarization system creates a topic-biased summary of related work specific to the... more

We introduce the novel problem of automatic related work summarization. Given multiple articles (eg, conference/journal papers) as input, a related work summarization system creates a topic-biased summary of related work specific to the target paper. Our prototype Related Work Summarization system, ReWoS, takes in set of keywords arranged in a hierarchical fashion that describes a target paper's topics, to drive the creation of an extractive summary using two different strategies for locating appropriate sentences for ...

A summary is a succinct and informative description of a data collection. In the context of multi-document summarization, the selection of the most relevant and not redundant sentences belonging to a collection of textual documents is... more

A summary is a succinct and informative description of a data collection. In the context of multi-document summarization, the selection of the most relevant and not redundant sentences belonging to a collection of textual documents is definitely a challenging task. Frequent itemset mining is a well-established data mining technique to discover correlations among data. Although it has been widely used in transactional data analysis, to the best of our knowledge, its exploitation in document summarization has never been investigated so far. This paper presents a novel multi-document summarizer, namely ItemSum (Itemset-based Summarizer), that is based on an itemset-based model, i.e., a model composed of frequent itemsets, extracted from the document collection. It automatically selects the most representative and not redundant sentences to include in the summary by considering both sentence coverage, with respect to a concise and highly informative itemset-based model, and a sentence relevance score, based on tf-idf statistics. Experimental results, performed on the DUC'04 document collection by means of ROUGE toolkit, show that the proposed approach achieves better performance than a large set of competitors.

Problem statement: Text summarization can be of different nature ranging from indicative summary that identifies the topics of the document to informative summary which is meant to represent the concise description of the original... more

Problem statement: Text summarization can be of different nature ranging from indicative summary that identifies the topics of the document to informative summary which is meant to represent the concise description of the original document, providing an idea of what the whole content of document is all about. Approach: Single document summary seems to capture both the information well but it has not been the case for multi document summary where the overall comprehensive quality in presenting informative summary often lacks. It is found that most of the existing methods tend to focus on sentence scoring and less consideration is given to the contextual information content in multiple documents. Results: In this study, some survey on multi document summarization approaches has been presented. We will direct our focus notably on four well known approaches to multi document summarization namely the feature based method, cluster based method, graph based method and knowledge based method. The general ideas behind these methods have been described. Conclusion: Besides the general idea and concept, we discuss the benefits and limitations concerning these methods. With the aim of enhancing multi document summarization, specifically news documents, a novel type of approach is outlined to be developed in the future, taking into account the generic components of a news story in order to generate a better summary.

We study the use of temporal information in the form of timelines to enhance multidocument summarization. We employ a fully automated temporal processing system to generate a timeline for each input document. We derive three features from... more

We study the use of temporal information in the form of timelines to enhance multidocument summarization. We employ a fully automated temporal processing system to generate a timeline for each input document. We derive three features from these timelines, and show that their use in supervised summarization lead to a significant 4.1% improvement in ROUGE performance over a state-of-the-art baseline. In addition, we propose TIMEMMR, a modification to Maximal Marginal Relevance that promotes temporal diversity by way of computing time span similarity, and show its utility in summarizing certain document sets. We also propose a filtering metric to discard noisy timelines generated by our automatic processes, to purify the timeline input for summarization. By selectively using timelines guided by filtering, overall summarization performance is increased by a significant 5.9%.

This paper presents two corpora produced within the RPM2 project: a multi-document summarization corpus and a sentence compression corpus. Both corpora are in French. The first one is the only one we know in this language. It contains 20... more

This paper presents two corpora produced within the RPM2 project: a multi-document summarization corpus and a sentence compression corpus. Both corpora are in French. The first one is the only one we know in this language. It contains 20 topics with 20 documents each. A first set of ...

Different summarization requirements could make the writing of a good summary more difficult, or easier. Summary length and the characteristics of the input are such constraints influencing the quality of a potential summary. In this... more

Different summarization requirements could make the writing of a good summary more difficult, or easier. Summary length and the characteristics of the input are such constraints influencing the quality of a potential summary. In this paper we report the results of a quantitative analysis on data from large-scale evaluations of multi-document summarization, empirically confirming this hypothesis. We further show that features measuring the cohesiveness of the input are highly correlated with eventual summary quality and that it is possible to use these as features to predict the difficulty of new, unseen, summarization inputs.

Since 2001, the Document Understanding Conferences have been the forum for researchers in automatic text sum-marization to compare methods and results on common test sets. Over the years, several types of summarization tasks have been... more

Since 2001, the Document Understanding Conferences have been the forum for researchers in automatic text sum-marization to compare methods and results on common test sets. Over the years, several types of summarization tasks have been addressed—single document ...

Answers to clinical questions are often complicated. In this pa- per, we formulate the problem as a multi-document summarization task, and construct extraction-based sum- maries to answer questions about ef- fects of applying a medication... more

Answers to clinical questions are often complicated. In this pa- per, we formulate the problem as a multi-document summarization task, and construct extraction-based sum- maries to answer questions about ef- fects of applying a medication to a disease. The experimental results show that identifying clinical out- comes and detecting their polarity improves the performance of sum- marization. Domain knowledge and

Due to the huge amount of information available on-line, the research and development of systems to automatically summarize multiple documents has become the focus of considerable interest and investment in many research sectors. One of... more

Due to the huge amount of information available on-line, the research and development of systems to automatically summarize multiple documents has become the focus of considerable interest and investment in many research sectors. One of the challenges that face multi-document summarization is generating user-focused summaries. Here, we present a new method for automatic user-focused summarization generation of multiple documents using