Document Summarization Research Papers - Academia.edu (original) (raw)

Took!the standard guided plant tour which _s offered routinely to the i;ubllc

- by
- •
- Engineering, Computer Aided Design, Energy Conservation, Energy Consumption
- by Miriam Revera
- •
- General Practice, Arterial hypertension, Clinical Practice, Hypertension

Where there is water scarcity, the situation is dramatic for aquatic ecosystems. In many Mediterranean basins the exploitation of water resources has gone clearly beyond renewable level and affects aquatic ecosystems. Thus, they may benefit from the recycling of high-quality effluents that can be used to cope with environmental water demands instead of being discharged. Their reclamation with natural technologies produces an improvement in quality based on the development of trophic webs built upon nutrients still dissolved in the reclaimed water. The main project in the Costa Brava area is that of the Empuriabrava constructed wetland system, where nitrified effluent is further treated to reduce the concentration of nutrients in the water and is reused for environmental enhancement. This facility is also an interesting site for bird-watching. Other projects where water recycling produces indirect benefits on the aquatic ecosystems are those in Tossa de Mar, affecting the "temporary" Tossa Creek (a watercourse which flows on temporary basis according to rainfall patterns), and in the Aro Valley, affecting the also "temporary", but slightly bigger, Ridaura River. This document summarizes these projects and proposes practical recommendations for the use of treated effluents in the recreation and restoration of aquatic ecosystems.

- by Lluís Sala
- •
- Aquatic Ecosystem, Water Demand, Water scarcity, Constructed Wetland

We propose an information retrieval (IR) model that combines relation and keyword matching. The model relies on a novel algorithm for relation matching. The algorithm takes the advantage of any existing relational similarity between document and query to improve retrieval effectiveness. If query concepts(terms) appearing in a document exhibit similar relationship then the proposed similarity measure will give high rank to the document as compared to those in which query terms exhibit different relationship. A conceptual graph (CG) representation has been used to capture relationship between concepts. In order to keep the approach computationally simple a simplified form of CG matching has been used instead of graph derivation. Structural variations have been captured during matching through simple heuristics. CG similarity measure proposed by us is simple, flexible and scalable and can find application in many related tasks like information filtering, question answering, document summarization etc.

- by Tanveer Siddiqui
- •
- Information Retrieval, Artificial Intelligence, Graph Theory, Modeling

This document summarizes microscopy study of concrete prepared from cement and fly ash (25% fly ash and 75% cement by weight), which covers coal fly ash and biomass fly ash. All the fly ash concrete has the statistical equal strength from one day to one year after mix. Scanning electron microscopy (SEM), Energy dispersive X-ray (EDX) and environmental scanning electron microscopy (ESEM) analysis show that both coal and biomass fly ash particles undergo significant changes of morphology and chemical compositions in concrete due to pozzolanic reaction, although biomass fly ash differs substantially from coal fly ash in its fuel resources.

- by Larry Baxter
- •
- Mechanical Engineering, Chemical Engineering, Scanning Electron Microscopy, SEM

This document summarizes the international developments of appropriate test methods, standards and risk models for the damage of lightning charge to the metallic components and metal-oxide surge arresters of high voltage electric power... more

- by Marco Jusevicius
- •
- International Development, Power System, High Voltage, South America

A sentence extract summary of a document is a subset of the document's sentences that contains the main ideas in the document. We present two approaches to generating such summaries. The rst uses a pivoted QR decomposition of the term-sentence matrix in order to identify sentences that have ideas that are distinct from those in other sentences. The second is based on a hidden Markov model that judges the likelihood that each sentence should be contained in the summary. We compare the results of these methods with summaries generated by humans, showing that we obtain higher agreement than do earlier methods. pervasive use of information retrieval systems in the last 6 years this area has been given wider attention HM00]. For example, Microsoft Word has a built-in tool to summarize documents created by it. Such a summary is an example of a generic summary, i.e., one that attempts to capture the essential points of a document. In addition, summarization methods are regularly used by web search engines to give a brief synopsis of the documents retrieved by a user query. Such query-based summaries can be more focused, since the user's query terms are known to the retrieval system and can be used to target the summaries to the query.

- by Jean Bousquet
- •
- Immunology, Vaccines, Forecasting, Asthma

In the field of law there is an absolute need for summarizing the texts of court decisions in order to make the content of the cases easily accessible for legal professionals. During the SALOMON and MOSAIC 2 projects we investigated the summarization and retrieval of legal cases. This article presents some of the main findings while integrating the research results of experiments on legal document summarization by other research groups. In addition, we propose novel avenues of research for automatic text summarization, which we currently exploit when summarizing court decisions in the ACILA 3 project. Techniques for automated concept learning and argument recognition are here the most challenging.

States, affecting an estimated 14% of the population. The prevalence of sinusitis is rising. Between 1990 and 1992, persons with sinusitis reported approximately 73 million restricted activity days--an increase from the 50 million restricted activity days reported between 1986 and 1988. Because critical questions remain unanswered about its cause, pathophysiology, and optimal treatment, sinusitis continues to generate significant health care costs and affects the quality of life of a large segment of the U.S. population. To identify critical directions for research on sinus disease, the American Academy of Allergy, Asthma and Immunology and the American Academy of Otolaryngology-Head and Neck Surgery Foundation, Inc., convened a meeting in January 1996 in collaboration with the National Institutes of Allergy and Infectious Disease. This document summarizes the proceedings of that meeting and presents what is intended to be the background for future investigation of the many unanswered questions related to sinusitis. (J Allergy Clin Immunol 1997;99: $829-48.)

- by John Fireman
- •
- Otolaryngology, Immunology, Quality of life, Asthma
- by Hans Yssel
- •
- Immunology, Vaccines, Forecasting, Asthma

We propose the use of the text of the sentences surrounding citations as an important tool for semantic interpretation of bioscience text. We hypothesize several different uses of citation sentences (which we call citances), including the creation of training and testing data for semantic analysis (especially for entity and relation recognition), synonym set creation, database curation, document summarization, and information

- by Rasim Alguliev
- •
- Document Summarization

We present a large-scale meta evaluation of eight evaluation measures for both single-document and multi-document summarizers. To this end we built a corpus consisting of (a) 100 Million automatic summaries using six summarizers and baselines at ten summary lengths in both English and Chinese, (b) more than 10,000 manual abstracts and extracts, and (c) 200 Million automatic document and summary retrievals using 20 queries. We present both qualitative and quantitative results showing the strengths and drawbacks of all evaluation methods and how they rank the different summarizers.

- by Horacio Saggion
- •
- Multi-Document Summarization, Large Scale, Document Summarization

In this paper, we propose a machine learning approach to rhetorical role identification from legal documents. In our approach, we annotate roles in sample documents with the help of legal experts and take them as training data. Conditional random field model has been trained with the data to perform rhetorical role identification with reinforcement of rich feature sets. The understanding of structure of a legal document and the application of mathematical model can brings out an effective summary in the final stage. Other important new findings in this work include that the training of a model for one sub-domain can be extended to another sub-domains with very limited augmentation of feature sets. Moreover, we can significantly improve extraction-based summarization results by modifying the ranking of sentences with the importance of specific roles.

Stemming words to (usually) remove suffixes has applications in text search, machine translation, document summarization, and text classification. For example, English stemming reduces the words "computer," "computing," "computation," and "computability" to their common morphological root, "comput-." In text search, this permits a search for "computers" to find documents containing all words with the stem "comput-." In the Indonesian language, stemming is of crucial importance: words have prefixes, suffixes, infixes, and confixes that make matching related words difficult. This work surveys existing techniques for stemming Indonesian words to their morphological roots, presents our novel and highly accurate CS algorithm, and explores the effectiveness of stemming in the context of general-purpose text information retrieval through ad hoc queries.

- by Bobby Nazief
- •
- Information Systems, Information Retrieval, Machine Translation, Text Classification
- by Hazra Imran
- •
- Art, Machine Learning, Data Mining, Writing

We propose a new method for using anaphoric information in Latent Semantic Analysis (lsa), and discuss its application to develop an lsa-based summarizer which achieves a significantly better performance than a system not using anaphoric information, and a better performance by the rouge measure than all but one of the single-document summarizers participating in duc-2002. Anaphoric information is automatically extracted using a new release of our own anaphora resolution system, guitar, which incorporates proper noun resolution. Our summarizer also includes a new approach for automatically identifying the dimensionality reduction of a document on the basis of the desired summarization percentage. Anaphoric information is also used to check the coherence of the summary produced by our summarizer, by a reference checker module which identifies anaphoric resolution errors caused by sentence extraction.

- by Mijail Kabadjov and +1
- •
- Information Systems, Latent Semantic Analysis, Information Processing, Anaphora Resolution
- by Sandra Kübler
- •
- Cognitive Science, Computational Linguistics, Linguistics, hidden Markov model

Document understanding techniques such as document clustering and multidocument summarization have been receiving much attention recently. Current document clustering methods usually represent the given collection of documents as a document-term matrix and then conduct the clustering process. Although many of these clustering methods can group the documents effectively, it is still hard for people to capture the meaning of the documents since there is no satisfactory interpretation for each document cluster. A straightforward solution is to first cluster the documents and then summarize each document cluster using summarization methods. However, most of the current summarization methods are solely based on the sentence-term matrix and ignore the context dependence of the sentences. As a result, the generated summaries lack guidance from the document clusters. In this article, we propose a new language model to simultaneously cluster and summarize documents by making use of both the document-term and sentenceterm matrices. By utilizing the mutual influence of document clustering and summarization, our method makes; (1) a better document clustering method with more meaningful interpretation; and (2) an effective document summarization method with guidance from document clustering. Experimental results on various document datasets show the effectiveness of our proposed method and the high interpretability of the generated summaries.

- by 昀季
- •
- Document Clustering, Clustering Method, Language Model, Nonnegative Matrix Factorization
- by Zack Turner
- •
- Computer Architecture, Thermodynamics, Computational Mechanics, Parallel Processing

This document summarizes the findings of the Accelerator Working Group (AWG) of the International Scoping Study (ISS) of a Future Neutrino Factory and Superbeam Facility. The work of the group took place at three plenary meetings along with three workshops, and an oral summary report was presented at the NuFact06 workshop held at UC-Irvine in August, 2006. The goal was to reach consensus on a baseline design for a Neutrino Factory complex. One aspect of this endeavor was to examine critically the advantages and disadvantages of the various Neutrino Factory schemes that have been proposed in recent years.

- by Kevin Gallardo
- •
- Instrumentation, System Design, Front end, Document Summarization

Engineering and research teams often develop new products and technologies by referring to inventions described in patent databases. Efficient patent analysis builds R&D knowledge, reduces new product development time, increases market... more

- by Chun-yi Wu and +1
- •
- Engineering, Systems Science, Knowledge sharing, Text Mining

- by Charles Trappey
- •
- Engineering, Systems Science, Knowledge sharing, Text Mining
- by Katy Vincent
- •
- Research Design, Pain, Evidence Based Medicine, Clinical Trial

Took!the standard guided plant tour which _s offered routinely to the i;ubllc

- by Brook Muller
- •
- Engineering, Computer Aided Design, Energy Conservation, Technology Assessment

Objective: This document summarizes the limited experience of SARS in pregnancy and suggests guidelines for management. Outcomes: Cases reported from Asia suggest that maternal and fetal outcomes are worsened by SARS during pregnancy. Evidence: Medline was searched for relevant articles published in English from 2000 to 2007. Case reports were reviewed and expert opinion sought. Values: Recommendations were made according to the guidelines developed by the Canadian Task Force on Preventive Health Care. Sponsors: The Society of Obstetricians and Gynaecologists of Canada. Recommendations 1. All hospitals should have infection control systems in place to ensure that alerts regarding changes in exposure risk factors for SARS or other potentially serious communicable diseases are conveyed promptly to clinical units, including the labour and delivery unit. (III-C) 2. At times of SARS outbreaks, all pregnant patients being assessed or admitted to the hospital should be screened for symptoms of and risk factors for SARS. (III-C) 3. Upon arrival in the labour triage unit, pregnant patients with suspected and probable SARS should be placed in a negative pressure isolation room with at least 6 air exchanges per hour. All labour and delivery units caring for suspected and probable SARS should have available at least one room in which patients can safely labour and deliver while in need of airborne isolation. (III-C) 358 l APRIL JOGC AVRIL 2009 SOGC CLINICAL PRACTICE GUIDELINE This Clinical Practice Guideline has been prepared by the Maternal Fetal Medicine Committee, reviewed by the Infectious Disease Committee, and approved by the Executive and Council of the Society of Obstetricians and Gynaecologists of Canada.

- by Eliana Castillo
- •
- Medicine, Maternal Mortality, Canada, Pregnancy

The technology of automatic document summarization is maturing and may provide a solution to the information overload problem. Nowadays, document summarization plays an important role in information retrieval. With a large volume of... more

- by Rasim Alguliev
- •
- Computer Science, Information Retrieval, Text Mining, Automatic Control

For effective multi-document summarization, it is important to reduce redundant information in the summaries and extract sentences, which are common to given documents. This paper presents a document summarization model which extracts key sentences from given documents while reducing redundant information in the summaries. An innovative aspect of our model lies in its ability to remove redundancy while selecting representative sentences. The model is represented as a discrete optimization problem. To solve the discrete optimization problem in this study an adaptive DE algorithm is created. We implemented our model on multi-document summarization task. Experiments have shown that the proposed model is to be preferred over summarization systems. We also showed that the resulting summarization system based on the proposed optimization approach is competitive on the DUC2002 and DUC2004 datasets.

The main target of cross-language summarization is to generate a summary in a different language from the language of the source document or documents. In this paper, it is proposed a Textual Energy approach to monodocument summarization plus the use of a Machine Translation online system to translate the source file from English into Spanish. Proficiency of the system was measured with FRESA 11 framework and compared with baseline summaries generated at different percentages.

- by Alfonso Medina Urrea
- •
- Machine Translation, Document Summarization

Edición, diseño de cubierta, preprensa y prensa digital: Proceditor ltda.

- by Florez Alejandro
- •
- Nutrition, Cognitive development, Impact Evaluation, Health

- by Melanie Basso
- •
- Medicine, Maternal Mortality, Canada, Pregnancy

Text Summarization is very effective in relevant assessment tasks. The Multiple Document Summarizer presents a novel approach to select sentences from documents according to several heuristic features. Summaries are generated modeling the set of documents as Semantic Vector Space Model (SVSM) and applying Principal Component Analysis (PCA) to extract topic features. Pure Statistical VSM assumes terms to be independent of each other and may result in inconsistent results. Vector space is enhanced semantically by modifying the weight of the word vector governed by Appearance and Disappearance (Action class) words. The knowledge base for Action words is maintained by classifying the words as Appearance or Disappearance with the help of Wordnet. The weights of the action words are modified in accordance with the Object list prepared by the collection of nouns corresponding to the action words. Summary thus generated provides more informative content as semantics of natural language has been taken into consideration.

- by Akhil Meshram
- •
- Computer Science, Principal Component Analysis, Chinese linguistics, Wordnet

Abstract Automatic segmentation of a text stream into topically coherent segments is an important component in natural language processing tasks such as information retrieval and document summarization. Machine learning techniques can... more

In this paper, we propose a novel idea for applying probabilistic graphical models for automatic text summarization task related to a lega l domain. Identification of rhetorical roles presen t in the sentences of a legal document is the importa nt text mining process involved in this task. A Conditional Random Field (CRF) is applied to segment a given legal

- by m saravanan
- •
- Computer Science, Information Retrieval, Machine Learning, Active Learning

A summary is a succinct and informative description of a data collection. In the context of multi-document summarization, the selection of the most relevant and not redundant sentences belonging to a collection of textual documents is definitely a challenging task. Frequent itemset mining is a well-established data mining technique to discover correlations among data. Although it has been widely used in transactional data analysis, to the best of our knowledge, its exploitation in document summarization has never been investigated so far. This paper presents a novel multi-document summarizer, namely ItemSum (Itemset-based Summarizer), that is based on an itemset-based model, i.e., a model composed of frequent itemsets, extracted from the document collection. It automatically selects the most representative and not redundant sentences to include in the summary by considering both sentence coverage, with respect to a concise and highly informative itemset-based model, and a sentence relevance score, based on tf-idf statistics. Experimental results, performed on the DUC'04 document collection by means of ROUGE toolkit, show that the proposed approach achieves better performance than a large set of competitors.

- by saima Jabeen
- •
- Data Mining, Text Mining, Multi-Document Summarization, Frequent Itemset Mining

In this paper we describe a new method of automatic summarization based on a learning step to identify criteria that maximize the correlation between human summary and peer extract. The proposed method uses a genetic algorithm to produce extracts from a collection of source documents describing the same event. Theses extracts are compared to human summaries using "Rouge measure" in

The multiplication of presentation contexts (such as mobile phones, PDAs) for multimedia documents requires the adaptation of document specifications. In an earlier work, a semantic framework for multimedia document adaptation was proposed. This framework deals with the semantics of the document composition by transforming the relations between multimedia objects. However, it was lacking the capability of suppressing multimedia objects. In this paper, we extend the proposed adaptation with this capability. Thanks to this extension, we present a method for summarizing multimedia documents. Moreover, when multimedia objects are removed, the resulted document satisfies some properties such as presentation contiguity. To validate our framework, we adapt standard multimedia documents such as SMIL documents. Abstract {m} P oster {m,Abstract {m,mi,b,bi} y y t t t t t t t t t t t t t t t t t t t t {m,mi,b,bi} ) ) T railer Characters {m,Abstract R y y t t

- by Nabil Layaïda
- •
- Boolean Satisfiability, Document Summarization

Since 2001, the Document Understanding Conferences have been the forum for researchers in automatic text sum-marization to compare methods and results on common test sets. Over the years, several types of summarization tasks have been... more

- by F. Gerigk
- •
- Instrumentation, System Design, Front end, Document Summarization

Single-document summarization aims to reduce
the size of a text document while preserving the
most important information. Much work has been
done on open-domain summarization. This paper
presents an automatic way to mine domain-specific
patterns from text documents. With a small amount
of effort required for manual selection, these patterns
can be used for domain-specific scenario-based document summarization and information extraction. Our
evaluation shows that scenario-based document summarization can both filter irrelevant documents and
create summaries for relevant documents within the
specified domain.

Took!the standard guided plant tour which _s offered routinely to the i;ubllc

- by Vladimír Piták
- •
- Engineering, Computer Aided Design, Energy Conservation, Technology Assessment

DOE Scientific and Technical Information. DOE Scientific and Technical Information. ...

- by Snake God
- •
- Materials Science, Nondestructive Evaluation, Nuclear power, Time of Flight

This paper deals with our recent research in text summarization. We went from single-document summarization through multidocument summarization to update summarization. We describe the development of our summarizer which is based on latent semantic analysis (LSA) and propose the update summarization component which determines the redundancy and novelty of each topic discovered by LSA. The final part of this paper presents the results of our participation in the experiment of Text Analysis Conference 2008.

- by Peter W de Leeuw
- •
- General Practice, Arterial hypertension, Clinical Practice, Hypertension

This paper proposes an approach on UNL document summarization. Our approach employs both the surface and semantic information of UNL annotation to summarize documents. With the merit of semantic annotation of the UNL, the essence of the document is efficiently collected which facilitates the abstraction function for language generation. The multilinguality can also be realized through the language decoverters from the summarized UNL document to the target languages under the UNL framework. The experiment result ...

- by Tanapong Potipiti and +1
- •
- Semantic Information, Semantic Annotation, Document Summarization, Target Language

As the amount of information on the Web grows, the ability to retrieve relevant information quickly and easily is necessary. The combination of ample news sources on the Web, little time to browse news, and smaller mobile devices motivates the development of automatic highlight extraction from single news articles. Our system, NetSum, is the first system to produce highlights of an article and significantly outperform the baseline. Our approach uses novel information sources to exploit human interest for highlight extraction. In this paper, we briefly describe the novelties of NetSum, originally presented at EMNLP 2007, and embed our work in the AI context.

- by Ilie Mihalcea
- •
- Information Sources, Mobile Device, Document Summarization

Users require more effective and efficient means of interaction with increasingly complex information and new interactive devices. This document summarizes the results of the international Dagstuhl Seminar on Coordination and Fusion in Multimodal Interaction that took place at Schloss Dagstuhl in Germany October 27 through November 2, 2001 1 . We first outline a research roadmap in the near and long term. Next we describe requirements and an abstract architecture for this class of systems. We then detail requirements for semantic representations and languages necessary to enable these systems. Finally, we describe data, annotation methodologies and tools necessary to further advance the field. We conclude with a recommended action plan for forward progress in the community.

- by Harry Bunt
- •
- Multimodal Interaction, Facial expression, Multimodal Communication, Action Plan