Transfer Learning for Abstractive Summarization at Controllable Budgets (original) (raw)
Related papers
Interpretable Multi-headed Attention for Abstractive Summarization at Controllable Lengths
Proceedings of the 28th International Conference on Computational Linguistics
ive summarization at controllable lengths is a challenging task in natural language processing. It is even more challenging for domains where limited training data is available or scenarios in which the length of the summary is not known beforehand. At the same time, when it comes to trusting machine-generated summaries, explaining how a summary was constructed in human-understandable terms may be critical. We propose Multi-level Summarizer (MLS), a supervised method to construct abstractive summaries of a text document at controllable lengths. The key enabler of our method is an interpretable multi-headed attention mechanism that computes attention distribution over an input document using an array of timestep independent semantic kernels. Each kernel optimizes a human-interpretable syntactic or semantic property. Exhaustive experiments on two low-resource datasets in English language show that MLS outperforms strong baselines by up to 14.70% in the METEOR score. Human evaluation of the summaries also suggests that they capture the key concepts of the document at various length-budgets.
Abstractive Text Summarization using Sequence-to-sequence RNNs and Beyond
Proceedings of The 20th SIGNLL Conference on Computational Natural Language Learning, 2016
In this work, we model abstractive text summarization using Attentional Encoder-Decoder Recurrent Neural Networks, and show that they achieve state-of-the-art performance on two different corpora. We propose several novel models that address critical problems in summarization that are not adequately modeled by the basic architecture, such as modeling keywords , capturing the hierarchy of sentence-toword structure, and emitting words that are rare or unseen at training time. Our work shows that many of our proposed models contribute to further improvement in performance. We also propose a new dataset consisting of multi-sentence summaries, and establish performance benchmarks for further research.
Controlling Length in Abstractive Summarization Using a Convolutional Neural Network
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, 2018
Convolutional neural networks (CNNs) have met great success in abstractive summarization, but they cannot effectively generate summaries of desired lengths. Because generated summaries are used in difference scenarios which may have space or length constraints, the ability to control the summary length in abstractive summarization is an important problem. In this paper, we propose an approach to constrain the summary length by extending a convolutional sequence to sequence model. The results show that this approach generates high-quality summaries with user defined length, and outperforms the baselines consistently in terms of ROUGE score, length variations and semantic similarity.
Improving Neural Abstractive Text Summarization with Prior Knowledge (Position Paper)
International Conference of the Italian Association for Artificial Intelligence, 2016
ive text summarization is a complex task whose goal is to generate a concise version of a text without necessarily reusing the sentences from the original source, but still preserving the meaning and the key contents. In this position paper we address this issue by modeling the problem as a sequence to sequence learning and exploiting Recurrent Neural Networks (RNN). Moreover, we discuss the idea of combining RNNs and probabilistic models in a unified way in order to incorporate prior knowledge, such as linguistic features. We believe that this approach can obtain better performance than the state-of-the-art models for generating well-formed summaries.
Generating Topic-Oriented Summaries Using Neural Attention
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers)
Summarizing a document requires identifying the important parts of the document with an objective of providing a quick overview to a reader. However, a long article can span several topics and a single summary cannot do justice to all the topics. Further, the interests of readers can vary and the notion of importance can change across them. Existing summarization algorithms generate a single summary and are not capable of generating multiple summaries tuned to the interests of the readers. In this paper, we propose an attention based RNN framework to generate multiple summaries of a single document tuned to different topics of interest. Our method outperforms existing baselines and our results suggest that the attention of generative networks can be successfully biased to look at sentences relevant to a topic and effectively used to generate topic-tuned summaries.
Neural Attention Model for Abstractive Text Summarization Using Linguistic Feature Space
IEEE Access
Summarization generates a brief and concise summary which portrays the main idea of the source text. There are two forms of summarization: abstractive and extractive. Extractive summarization chooses important sentences from the text to form a summary whereas abstractive summarization paraphrase using advanced and nearer-to human explanation by adding novel words or phrases. For a human annotator, producing summary of a document is time consuming and expensive because it requires going through the long document and composing a short summary. An automatic feature-rich model for text summarization is proposed that can reduce the amount of labor and produce a quick summary by using both extractive and abstractive approach. A feature-rich extractor highlights the important sentences in the text and linguistic characteristics are used to enhance results. The extracted summary is then fed to an abstracter to further provide information using features such as named entity tags, part of speech tags and term weights. Furthermore, a loss function is introduced to normalize the inconsistency between word-level and sentencelevel attentions. The proposed two-staged network achieved a ROUGE score of 37.76% on the benchmark CNN/DailyMail dataset, outperforming the earlier work. Human evaluation is also conducted to measure the comprehensiveness, conciseness and informativeness of the generated summary.
Mathematical Problems in Engineering
In recent years, the volume of textual data has rapidly increased, which has generated a valuable resource for extracting and analysing information. To retrieve useful knowledge within a reasonable time period, this information must be summarised. This paper reviews recent approaches for abstractive text summarisation using deep learning models. In addition, existing datasets for training and validating these approaches are reviewed, and their features and limitations are presented. The Gigaword dataset is commonly employed for single-sentence summary approaches, while the Cable News Network (CNN)/Daily Mail dataset is commonly employed for multisentence summary approaches. Furthermore, the measures that are utilised to evaluate the quality of summarisation are investigated, and Recall-Oriented Understudy for Gisting Evaluation 1 (ROUGE1), ROUGE2, and ROUGE-L are determined to be the most commonly applied metrics. The challenges that are encountered during the summarisation process ...
ECTI Transactions on Computer and Information Technology, 2024
Article information: Summarizing information provided within tables of scientic documents has always been a problem. A system that can summarize this vital information, which a table encapsulates, can provide readers with a quick and straightforward solution to comprehend the contents of the document. To train such systems, we need data, and nding a quality one is tricky. To mitigate this challenge, we developed a high-quality corpus that contains both extractive and abstractive summaries derived from tables, using a rule-based approach. This dataset was validated using a combination of automated and manual metrics. Subsequently, we developed a novel Encoder-Decoder framework, along with attention, to generate abstractive summaries from extractive ones. This model works on a mix of extractive summaries and inter-sentential similarity embeddings and learns to map them to corresponding abstractive summaries. On experimentation, we discovered that our model addresses the saliency factor of summarization, an aspect overlooked by previous works. Further experiments show that our model develops coherent abstractive summaries, validated by high BLEU and ROUGE scores.
Abstractive Sentence Summarization with Attentive Recurrent Neural Networks
Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
ive Sentence Summarization generates a shorter version of a given sentence while attempting to preserve its meaning. We introduce a conditional recurrent neural network (RNN) which generates a summary of an input sentence. The conditioning is provided by a novel convolutional attention-based encoder which ensures that the decoder focuses on the appropriate input words at each step of generation. Our model relies only on learned features and is easy to train in an end-to-end fashion on large data sets. Our experiments show that the model significantly outperforms the recently proposed state-of-the-art method on the Gigaword corpus while performing competitively on the DUC-2004 shared task.
A Neural Attention Model for Abstractive Sentence Summarization
Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, 2015
Summarization based on text extraction is inherently limited, but generation-style abstractive methods have proven challenging to build. In this work, we propose a fully data-driven approach to abstractive sentence summarization. Our method utilizes a local attention-based model that generates each word of the summary conditioned on the input sentence. While the model is structurally simple, it can easily be trained end-to-end and scales to a large amount of training data. The model shows significant performance gains on the DUC-2004 shared task compared with several strong baselines.