University of Mannheim @ CLSciSumm-17: Citation-Based Summarization of Scientific Articles Using Semantic Textual Similarity (original) (raw)
Related papers
Proceedings of the AAAI Conference on Artificial Intelligence
Scientific article summarization is challenging: large, annotated corpora are not available, and the summary should ideally include the article’s impacts on research community. This paper provides novel solutions to these two challenges. We 1) develop and release the first large-scale manually-annotated corpus for scientific papers (on computational linguistics) by enabling faster annotation, and 2) propose summarization methods that integrate the authors’ original highlights (abstract) and the article’s actual impacts on the community (citations), to create comprehensive, hybrid summaries. We conduct experiments to demonstrate the efficacy of our corpus in training data-driven models for scientific paper summarization and the advantage of our hybrid summaries over abstracts and traditional citation-based summaries. Our large annotated corpus and hybrid methods provide a new framework for scientific paper summarization research.
2019
The CL-SCisumm track provides a framework to evaluate systems summarising scientific papers. It includes datasets and metrics provided by the organisers. The track comprises three tasks: (1a) identifying the spans of text in the referred document reflecting citing ext spans i.e., citances), (1b) classifying discourse facets ofthe cited text spans, and (2) generating a short structured summary. For the 2019 edi-tion, we focused our work on the task 1a. This report presents our proposed approachfor this task. We submitted 15 runs corresponding to different configurations of theparameters involved in our approach.
Generating Extractive Summaries of Scientific Paradigms
2013
Researchers and scientists increasingly find themselves in the position of having to quickly understand large amounts of technical material. Our goal is to effectively serve this need by using bibliometric text mining and summarization techniques to generate summaries of scientific literature. We show how we can use citations to produce automatically generated, readily consumable, technical extractive summaries. We first propose C-LexRank, a model for summarizing single scientific articles based on citations, which employs community detection and extracts salient information-rich sentences. Next, we further extend our experiments to summarize a set of papers, which cover the same scientific topic. We generate extractive summaries of a set of Question Answering (QA) and Dependency Parsing (DP) papers, their abstracts, and their citation sentences and show that citations have unique information amenable to creating a summary.
2020
Related work sections or literature reviews are an essential part of every scientific article being crucial for paper reviewing and assessment. The automatic generation of related work sections can be considered an instance of the multi-document summarization problem. In order to allow the study of this specific problem, we have developed a manually annotated, machine readable data-set of related work sections, cited papers (e.g. references) and sentences, together with an additional layer of papers citing the references. We additionally present experiments on the identification of cited sentences, using as input citation contexts. The corpus alongside the gold standard are made available for use by the scientific community.
Overview of the CL-SciSumm 2016 Shared Task
2016
The CL-SciSumm 2016 Shared Task is the first medium-scale shared task on scientific document summarization in the computational linguistics (CL) domain. The task built off of the experience and training data set created in its namesake pilot task, which was conducted in 2014 by the same organizing committee. The track included three tasks involving: (1A) identifying relationships between citing documents and the referred document, (1B) classifying the discourse facets, and (2) generating the abstractive summary. The dataset comprised 30 annotated sets of citing and reference papers from the open access research papers in the CL domain. This overview paper describes the participation and the official results of the second CL-SciSumm Shared Task, organized as a part of the Joint Workshop onBibliometric-enhanced Information Retrieval and Natural Language Processing for Digital Libraries (BIRNDL 2016), held in New Jersey,USA in June, 2016. The annotated dataset used for this shared task...
The CL-SciSumm Shared Task 2017: Results and Key Insights
2017
This overview describes the official results of the CL-SciSumm Shared Task 2018 -- the first medium-scale shared task on scientific document summarization in the computational linguistics (CL) domain. This year, the dataset comprised 60 annotated sets of citing and reference papers from the open access research papers in the CL domain. The Shared Task was organized as a part of the 41st Annual Conference of the Special Interest Group in Information Retrieval (SIGIR), held in Ann Arbor, USA in July 2018. We compare the participating systems in terms of two evaluation metrics. The annotated dataset and evaluation scripts can be accessed and used by the community from: \url{this https URL}.
A new approach for finding semantic similar scientific articles
Calculating article similarities enables users to find similar articles and documents in a collection of articles. Two similar documents are extremely helpful for text applications such as document-to-document similarity search, plagiarism checker, text mining for repetition, and text filtering. This paper proposes a new method for calculating the semantic similarities of articles. WordNet is used to find word semantic associations. The proposed technique first compares the similarity of each part two by two. The final results are then calculated based on weighted mean from different parts. Results are compared with human scores to find how it is close to Pearson's correlation coefficient. The correlation coefficient above 87 percent is the result of the proposed system. The system works precisely in identifying the similarities.
The CL-SciSumm Shared Task 2018: Results and Key Insights
International ACM SIGIR Conference on Research and Development in Information Retrieval, 2018
This overview describes the official results of the CL-SciSumm Shared Task 2018-the first medium-scale shared task on scientific document summarization in the computational linguistics (CL) domain. This year, the dataset comprised 60 annotated sets of citing and reference papers from the open access research papers in the CL domain. The Shared Task was organized as a part of the 41 st Annual Conference of the Special Interest Group in Information Retrieval (SIGIR), held in Ann Arbor, USA in July 2018. We compare the participating systems in terms of two evaluation metrics. The annotated dataset and evaluation scripts can be accessed and used by the community from: https://github.com/WING-NUS/scisumm-corpus.
Trainable Citation-enhanced Summarization of Scientific Articles
2016
This work is (partly) supported by the Spanish Ministry of Economy and Competitiveness under the Maria de Maeztu Units of Excellence Programme (MDM-2015-0502), the TUNER project (TIN2015-65308-C5-5-R, MINECO/FEDER, UE) and the European Project Dr. Inventor (FP7-ICT-2013.8.1 - Grant no: 611383).
Overview and Results: CL-SciSumm Shared Task 2019
International ACM SIGIR Conference on Research and Development in Information Retrieval, 2019
The CL-SciSumm Shared Task is the first medium-scale shared task on scientific document summarization in the computational linguistics (CL) domain. In 2019, it comprised three tasks: (1A) identifying relationships between citing documents and the referred document, (1B) classifying the discourse facets, and (2) generating the abstractive summary. The dataset comprised 40 annotated sets of citing and reference papers of the CL-SciSumm 2018 corpus and 1000 more from the SciSumm-Net dataset. All papers are from the open access research papers in the CL domain. This overview describes the participation and the official results of the CL-SciSumm 2019 Shared Task, organized as a part of the 42 nd Annual Conference of the Special Interest Group in Information Retrieval (SIGIR), held in Paris, France in July 2019. We compare the participating systems in terms of two evaluation metrics and discuss the use of ROUGE as an evaluation metric. The annotated dataset used for this shared task and the scripts used for evaluation can be accessed and used by the community at: https://github.com/WING-NUS/scisumm-corpus.