Minimum redundancy and maximum relevance for single and multi-document Arabic text summarization (original) (raw)
Related papers
Journal of King Saud University - Computer and Information Sciences, 2019
The exponential growth of online textual data triggered the crucial need for an effective and powerful tool that automatically provides the desired content in a summarized form while preserving core information. In this paper, we propose an automatic, generic, and extractive Arabic single document summarizing method aiming at producing a sufficiently informative summary. The proposed extractive method evaluates each sentence based on a combination of statistical and semantic features in which a novel formulation is used taking into account sentence importance, coverage and diversity. Further, two summarizing techniques including score-based and supervised machine learning were employed to produce the summary and then assist leveraging the designed features. We demonstrate the effectiveness of the proposed method through a set of experiments under EASC corpus using ROUGE measure. Compared to some existing related work, the experimental evaluation shows the strength of the proposed method in terms of precision, recall, and F-score performance metrics.
A Comprehensive Review of Arabic Text Summarization
IEEE Access
The explosion of online and offline data has changed how we gather, evaluate, and understand data. It is frequently difficult and time-consuming to comprehend large text documents and extract crucial information from them. Text summarization techniques address the mentioned problems by compressing long texts while retaining their essential contents. These techniques rely on the fast delivery of filtered, high-quality content to their users. Due to the massive amounts of data generated by technology and various sources, automated text summarization of large-scale data is challenging. There are three types of automatic text summarization techniques: extractive, abstractive, and hybrid. Regardless of these previous techniques, the generated summaries are a long way from the summarization produced by human experts. Although Arabic is a widely spoken language that is frequently used for content sharing on the web, Arabic text summarization of Arabic content is limited and still immature because of several problems, including the Arabic language's morphological structure, the variety of dialects, and the lack of adequate data sources. This paper reviews text summarization approaches and recent deep learning models for this approach. Additionally, it focuses on existing datasets for these approaches, which are also reviewed, along with their characteristics and limitations. The most often used metrics for summarization quality evaluation are ROUGE1, ROUGE2, ROUGE L, and Bleu. The challenges that are encountered during Arabic text summarizing methods and approaches and the solutions proposed in each approach are analyzed. Many Arabic text summarization methods have problems, such as the lack of golden tokens during testing, being out of vocabulary (OOV) words, repeating summary sentences, lack of standard systematic methodologies and architectures, and the complexity of the Arabic language. Finally, providing the required corpora, improving evaluation using semantic representations, the lack of using rouge metrics in abstractive text summarization, and using recent deep learning models to adopt them in Arabic summarization studies is an essential demand.
A Proposed Method for Summarizing Arabic Single Document
International Journal of Computer Applications, 2018
This paper proposes an automatic text summarization method, which is considered as a selective process for the most important information in the original text. It could be divided into two types extractive and abstractive. In this study, a system for single documents text summarization is introduced to be used for Arabic text that rely on extractive method. According to this, we will go three stages, which are pre-processing phase, scoring of sentence, and summery generation. The pre-processing phase starts by removing punctuation marks, stop words, unifies synonyms as well as stemming words to obtain root form. Then it measures every sentence according to a collection of features in order to get the sentences with a higher score to be included in the final summary. The system has been evaluated by comparing between manual and automatic summarizations and some measurements are used especially Rouge measure. Manual summarize is done by two human experts to check the summaries' quality in terms of the general form, content, coherence of the phrases, lack of elaboration, repetition, and completeness of the meaning. The final results proved that the proposed method achieved the higher performance than other systems.
A New Approach for Arabic Text Summarization
2021
Due to the increasing number of online textual information, acquiring relevant information quickly has become a challenging task. Automatic text summarization (TS) offers a powerful solution for the quick exploitation of these resources. It consists of producing a short representation of an input text while preserving its relevant information and overall meaning. Automatic text summarization has seen a great attention for Indo-European languages. However, for Arabic, researches in this field have not yet attaint a notable progress. Most of the existing approaches in Arabic text summarization literature rely mainly on numerical techniques and neglect semantic and rhetorical relations connecting text units. This affects negatively the global coherence of the generated summary and its readability. In this paper, we attempt to overcome this limitation by proposing a new approach that combines a rhetorical analysis following the rhetorical structure theory (RST) and a statistical-based m...
Automatic Arabic Text Summarization System (AATSS) Based on Semantic Features Extraction
International Journal of Technology Diffusion, 2012
Recently, the need has increased for an effective and powerful tool to automatically summarize text. For English and European languages an intensive works have been done with high performance and nowadays they look forward to multi-document and multi-language summarization. However, Arabic language still suffers from the little attentions and research done in this filed. In this paper, we propose a model to automatically summarize Arabic text using text extraction. Various steps are involved in the approach: preprocessing text, extract set of features, classify sentence based on scoring method, ranking sentences and finally generate an extracted summary. The main difference between the proposed system and other Arabic summarization systems are the consideration of semantics, entity objects such as names and places, and similarity factors in our proposed system. The proposed system has been applied on news domain using a dataset osbtained from Local newspaper. Manual evaluation techn...
A New Model for Arabic Multi-Document Text Summarization
2018
Nowadays, the amount of Arabic documents has increased significantly in different domains, such as news articles, emails, business summary, biomedicine, web sites and social media documents. Some databases have increased in its size to terabyte. Multi-document summarization is the method of creating a summary of a group of interrelated documents. Therefore, the rise of the desire for Arabic multi documents text summarization (at the instant rates possible, coherent, grammatical and meaningful sentences) is increased. Recently, many efforts on multi-document text summarization that is related to the English language have been performed. Arabic multi-document summarization is remained on its early stages. Consequently, the researchers in this paper propose an Arabic Multi-Document Text Summarization (AMD-TS) model based on parallel computing techniques. This model of Arabic text summarization could effectively and rapidly summarize Arabic multi-documents in real time. A conceptual fra...
The volume of information available on the Web is increasing rapidly. The need for systems that can automatically summarize documents is becoming ever more desirable. For this reason, text summarization has quickly grown into a major research area as illustrated by the DUC and TAC conference series. Summarization systems for Arabic are however still not as sophisticated and as reliable as those developed for languages like English. In this paper we discuss two summarization systems for Arabic and report on a large user study performed on these systems. The first system, the Arabic Query-Based Text Summarization System (AQBTSS), uses standard retrieval methods to map a query against a document collection and to create a summary. The second system, the Arabic Concept-Based Text Summarization System (ACBTSS), creates a query-independent document summary. Five groups of users from different ages and educational levels participated in evaluating our systems. Each group had 300 individuals. We also performed a comparative evaluation with a commercial Arabic summarization system.
Computer Speech & Language, 2012
Automatic text summarization is an essential tool in this era of information overloading. In this paper we present an automatic extractive Arabic text summarization system where the user can cap the size of the final summary. It is a direct system where no machine learning is involved. We use a two pass algorithm where in pass one, we produce a primary summary using Rhetorical Structure Theory (RST); this is followed by the second pass where we assign a score to each of the sentences in the primary summary. These scores will help us in generating the final summary. For the final output, sentences are selected with an objective of maximizing the overall score of the summary whose size should not exceed the user selected limit. We used Rouge to evaluate our system generated summaries of various lengths against those done by a (human) news editorial professional. Experiments on sample texts show our system to outperform some of the existing Arabic summarization systems including those that require machine learning.
Arabic Text Summarization Using AraBERT Model Using Extractive Text Summarization Approach
2020
recently, after the life of the individual changed and became more crowded with all the concerns of life, and with the diversity and the increasing of sources of knowledge on the Internet, it became difficult for us to read large texts and articles, so we are looking for the summaries of these texts before deciding dive deeply in reading. For this reason, it became urgent to provide tools to fulfill this function by extracting basic information while preserving the essence of the text. In this study, we proposed an extractive Arabic text summarizer based on a general-purpose architecture for Natural Language Generation (NLG) and Natural Language Understanding (NLU) like (AraBERT, BERT, XLNet, XLM, etc.) to summarize the Arabic document by evaluating and extracting the most important sentences at this document. Then, using the Rouge measure and human evaluation, we compared the efficiency between the proposed and other solutions to recommend what the best one we can use to summarize ...
Query Based Arabic Text Summarization
With the problem of increased web resources and the huge amount of information available, the necessity of having automatic summarization systems appeared. Since summarization is needed the most in the process of searching for information on the web, where the user aims at a certain domain of interest according to his query, in this case domain-based summaries would serve the best. Despite the existence of plenty of research work in the domain-based summarization in English, there is lack of them in Arabic due to the shortage of existing knowledge bases. In this paper we introduce a query based, Arabic text, single document summarization using an existing Arabic language thesaurus and an extracted knowledge base. We use an Arabic corpus to extract domain knowledge represented by topic related concepts/ keywords and the lexical relations among them. The user’s query is expanded once by using the Arabic WordNet thesaurus and then by adding the domain specific knowledge base to the expansion. For the summarization dataset, Essex Arabic Summaries Corpus was used. It has many topic based articles with multiple human summaries. The performance appeared to be enhanced when using our extracted knowledge base than to just use the WordNet