Domain Adaptation with Pre-trained Transformers for Query-Focused Abstractive Text Summarization (original) (raw)
Related papers
Advances in Artificial Intelligence, 2020
In the Query Focused Abstractive Summarization (QFAS) task, the goal is to generate abstractive summaries from the source document that are relevant to the given query. In this paper, we propose a new transfer learning technique by utilizing the pre-trained transformer architecture for the QFAS task in the Debatepedia dataset. We find that the Diversity Driven Attention model (DDA), which was the first model applied on this dataset, only performs well when the dataset is augmented by creating more training instances. In contrast, without requiring any in-domain data augmentation, our proposed approach outperforms the DDA model as well as sets a new state-of-the-art result.
Proceedings of the 28th International Conference on Computational Linguistics, 2020
In the Query Focused Multi-Document Summarization (QF-MDS) task, a set of documents and a query are given where the goal is to generate a summary from these documents based on the given query. However, one major challenge for this task is the lack of availability of labeled training datasets. To overcome this issue, in this paper, we propose a novel weakly supervised learning approach via utilizing distant supervision. In particular, we use datasets similar to the target dataset as the training data where we leverage pre-trained sentence similarity models to generate the weak reference summary of each individual document in a document set from the multidocument gold reference summaries. Then, we iteratively train our summarization model on each single-document to alleviate the computational complexity issue that occurs while training neural summarization models in multiple documents (i.e., long sequences) at once. Experimental results in Document Understanding Conferences 1 (DUC) datasets show that our proposed approach sets a new state-of-the-art result in terms of various evaluation metrics.
Transfer Learning for Abstractive Summarization at Controllable Budgets
ArXiv, 2020
Summarizing a document within an allocated budget while maintaining its major concepts is a challenging task. If the budget can take any arbitrary value and not known beforehand, it becomes even more difficult. Most of the existing methods for abstractive summarization, including state-of-the-art neural networks are data intensive. If the number of available training samples becomes limited, they fail to construct high-quality summaries. We propose MLS, an end-to-end framework to generate abstractive summaries with limited training data at arbitrary compression budgets. MLS employs a pair of supervised sequence-to-sequence networks. The first network called the \textit{MFS-Net} constructs a minimal feasible summary by identifying the key concepts of the input document. The second network called the Pointer-Magnifier then generates the final summary from the minimal feasible summary by leveraging an interpretable multi-headed attention model. Experiments on two cross-domain datasets ...
Sequential Transfer Learning in NLP for German Text Summarization
2019
This work examines the impact of sequential transfer learning on abstractive machine summarization. A current trend in Natural Language Processing (NLP) is to pre-train extensive language models and then adapt these models to solve various target tasks. Since these techniques have rarely been investigated in the context of text summarization, this work develops an approach to integrate and evaluate pretrained language models in abstractive text summarization. Our experiments suggest that pre-trained language models can improve summarizing texts. We find that using multilingual BERT (Devlin et al., 2018) as contextual embeddings lifts our model by about 9 points of ROUGE-1 and ROUGE-2 on a German summarization task.
Abstractive Text Summarization based on Language Model Conditioning and Locality Modeling
2020
We explore to what extent knowledge about the pre-trained language model that is used is beneficial for the task of abstractive summarization. To this end, we experiment with conditioning the encoder and decoder of a Transformer-based neural model on the BERT language model. In addition, we propose a new method of BERT-windowing, which allows chunk-wise processing of texts longer than the BERT window size. We also explore how locality modeling, i.e., the explicit restriction of calculations to the local context, can affect the summarization ability of the Transformer. This is done by introducing 2-dimensional convolutional self-attention into the first layers of the encoder. The results of our models are compared to a baseline and the state-of-the-art models on the CNN/Daily Mail dataset. We additionally train our model on the SwissText dataset to demonstrate usability on German. Both models outperform the baseline in ROUGE scores on two datasets and show its superiority in a manual...
Automated News Summarization Using Transformers
Lecture notes in electrical engineering, 2022
The amount of text data available online is increasing at a very fast pace hence text summarization has become essential. Most of the modern recommender and text classification systems require going through a huge amount of data. Manually generating precise and fluent summaries of lengthy articles is a very tiresome and time-consuming task. Hence generating automated summaries for the data and using it to train machine learning models will make these models space and time-efficient. Extractive summarization and abstractive summarization are two separate methods of generating summaries. The extractive technique identifies the relevant sentences from the original document and extracts only those from the text. Whereas in abstractive summarization techniques, the summary is generated after interpreting the original text, hence making it more complicated. In this paper, we will be presenting a comprehensive comparison of a few transformer architecture based pre-trained models for text summarization. For analysis and comparison, we have used the BBC news dataset that contains text data that can be used for summarization and human generated summaries for evaluating and comparing the summaries generated by machine learning models.
Long Document Summarization in a Low Resource Setting using Pretrained Language Models
2021
Abstractive summarization is the task of compressing a long document into a coherent short document while retaining salient information. Modern abstractive summarization methods are based on deep neural networks which often require large training datasets. Since collecting summarization datasets is an expensive and time-consuming task, practical industrial settings are usually low-resource. In this paper, we study a challenging low-resource setting of summarizing long legal briefs with an average source document length of 4268 words and only 120 available (document, summary) pairs. To account for data scarcity, we used a modern pre-trained abstractive summarizer BART, which only achieves 17.9 ROUGE-L as it struggles with long documents. We thus attempt to compress these long documents by identifying salient sentences in the source which best ground the summary, using a novel algorithm based on GPT-2 language model perplexity scores, that operates within the low resource regime. On f...
Analyzing Multi-Task Learning for Abstractive Text Summarization
Proceedings of the 2nd Workshop on Natural Language Generation, Evaluation, and Metrics (GEM 2022), 2022
Despite the recent success of multi-task learning and pre-finetuning for natural language understanding, few works have studied the effects of task families on abstractive text summarization. Task families are a form of task grouping during the pre-finetuning stage to learn common skills, such as reading comprehension. To close this gap, we analyze the influence of multi-task learning strategies using task families for the English abstractive text summarization task. We group tasks into one of three strategies, i.e., sequential, simultaneous, and continual multi-task learning, and evaluate trained models through two downstream tasks. We find that certain combinations of task families (e.g., advanced reading comprehension and natural language inference) positively impact downstream performance. Further, we find that choice and combinations of task families influence downstream performance more than the training scheme, supporting the use of task families for abstractive text summarization. Our code is publicly available 1 .
Summarizing Text on Any Aspects: A Knowledge-Informed Weakly-Supervised Approach
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), 2020
Given a document and a target aspect (e.g., a topic of interest), aspect-based abstractive summarization attempts to generate a summary with respect to the aspect. Previous studies usually assume a small pre-defined set of aspects and fall short of summarizing on other diverse topics. In this work, we study summarizing on arbitrary aspects relevant to the document, which significantly expands the application of the task in practice. Due to the lack of supervision data, we develop a new weak supervision construction method and an aspect modeling scheme, both of which integrate rich external knowledge sources such as Concept-Net and Wikipedia. Experiments show our approach achieves performance boosts on summarizing both real and synthetic documents given pre-defined or arbitrary aspects. 1