A New Approach for Arabic Text Summarization (original) (raw)
Related papers
Computer Speech & Language, 2012
Automatic text summarization is an essential tool in this era of information overloading. In this paper we present an automatic extractive Arabic text summarization system where the user can cap the size of the final summary. It is a direct system where no machine learning is involved. We use a two pass algorithm where in pass one, we produce a primary summary using Rhetorical Structure Theory (RST); this is followed by the second pass where we assign a score to each of the sentences in the primary summary. These scores will help us in generating the final summary. For the final output, sentences are selected with an objective of maximizing the overall score of the summary whose size should not exceed the user selected limit. We used Rouge to evaluate our system generated summaries of various lengths against those done by a (human) news editorial professional. Experiments on sample texts show our system to outperform some of the existing Arabic summarization systems including those that require machine learning.
The volume of information available on the Web is increasing rapidly. The need for systems that can automatically summarize documents is becoming ever more desirable. For this reason, text summarization has quickly grown into a major research area as illustrated by the DUC and TAC conference series. Summarization systems for Arabic are however still not as sophisticated and as reliable as those developed for languages like English. In this paper we discuss two summarization systems for Arabic and report on a large user study performed on these systems. The first system, the Arabic Query-Based Text Summarization System (AQBTSS), uses standard retrieval methods to map a query against a document collection and to create a summary. The second system, the Arabic Concept-Based Text Summarization System (ACBTSS), creates a query-independent document summary. Five groups of users from different ages and educational levels participated in evaluating our systems. Each group had 300 individuals. We also performed a comparative evaluation with a commercial Arabic summarization system.
A Comprehensive Review of Arabic Text Summarization
IEEE Access
The explosion of online and offline data has changed how we gather, evaluate, and understand data. It is frequently difficult and time-consuming to comprehend large text documents and extract crucial information from them. Text summarization techniques address the mentioned problems by compressing long texts while retaining their essential contents. These techniques rely on the fast delivery of filtered, high-quality content to their users. Due to the massive amounts of data generated by technology and various sources, automated text summarization of large-scale data is challenging. There are three types of automatic text summarization techniques: extractive, abstractive, and hybrid. Regardless of these previous techniques, the generated summaries are a long way from the summarization produced by human experts. Although Arabic is a widely spoken language that is frequently used for content sharing on the web, Arabic text summarization of Arabic content is limited and still immature because of several problems, including the Arabic language's morphological structure, the variety of dialects, and the lack of adequate data sources. This paper reviews text summarization approaches and recent deep learning models for this approach. Additionally, it focuses on existing datasets for these approaches, which are also reviewed, along with their characteristics and limitations. The most often used metrics for summarization quality evaluation are ROUGE1, ROUGE2, ROUGE L, and Bleu. The challenges that are encountered during Arabic text summarizing methods and approaches and the solutions proposed in each approach are analyzed. Many Arabic text summarization methods have problems, such as the lack of golden tokens during testing, being out of vocabulary (OOV) words, repeating summary sentences, lack of standard systematic methodologies and architectures, and the complexity of the Arabic language. Finally, providing the required corpora, improving evaluation using semantic representations, the lack of using rouge metrics in abstractive text summarization, and using recent deep learning models to adopt them in Arabic summarization studies is an essential demand.
Automatic Arabic Summarization: A survey of methodologies and systems
Procedia Computer Science, 2017
Text summarization has been a field of intensive research over the last 50 years, especially for commonly-used and relatively simple-grammar languages such as English. Moreover, the unprecedented growth in the amount of online information available in many languages to users and businesses, including news articles and social media, has made it difficult and time consuming for users to identify and consume sought after content. Hence, an automatic text summarization for various languages to generate accurate and relevant summaries from the huge amount of information available is essential nowadays. Techniques and methodologies for Arabic text summarization are still immature due to the inherent complexity of the Arabic language in terms of both structure and morphology. This paper describes the main challenges for Arabic text summarization and surveys the various methodologies and systems in the literature. This survey would be a good basis for the design of an Arabic automatic text summarization that combines the various "good" features of the existing systems and dismiss the "not-so-good" features.
Automatic Summarization of Arabic Texts based on RST Technique
2010
We present in this paper an automatic summarization technique of Arabic texts, based on RST. We first present a corpus study which enabled us to specify, following empirical observations, a set of relations and rhetorical frames. Then, we present our method to automatically summarize Arabic texts. Finally, we present the architecture of the ARSTResume system. This method is based on the Rhetorical Structure Theory and uses linguistic knowledge. The method relies on three pillars. The first consists in locating the rhetorical relations between the minimal units of the text by applying the rhetorical rules. One of these units is the nucleus (the segment necessary to maintain coherence) and the other can be either nucleus or satellite (an optional segment). The second pillar is the representation and the simplification of the RST-tree that represents the entries text in hierarchical form. The third pillar is the selection of sentences for the final summary, which takes into account the type of the rhetorical relations chosen for the extract.
Automatic Arabic Text Summarization Approaches
International Journal of Computer Applications, 2017
In recent years, automatic text summarization has seen renewed interest, and has been experiencing an increasing number of researches and products especially in English language. However, in Arabic language, little works and limited researches have been done in this field. This paper exposes a literature review of recent research works on Arabic text summarization. Current approaches used in this field are presented followed by a discussion about their limitations and the main challenges faced when dealing with such application. As a final point, a proposed approach to improve the quality of Arabic text summarization system is presented.
Experimenting with automatic text summarisation for arabic
2011
The volume of information available on the Web is increasing rapidly. The need for systems that can automatically summarise documents is becoming ever more desirable. For this reason, text summarisation has quickly grown into a major research area as illustrated by the DUC and TAC conference series. Summarisation systems for Arabic are however still not as sophisticated and as reliable as those developed for languages like English.
Extracting Sentences Using Lexical Cohesion for Arabic Text Summarization
Int. J. Comput. Linguistics Appl., 2015
Automatic Text Summarization has received a great deal of attention in the past couple of decades. It has gained a lot of interest especially with the proliferation of the Internet and the new technologies. Arabic as a language still lacks research in the field of Information Retrieval. In this paper, we explore lexical cohesion using lexical chains for an extractive summarization system for Arabic documents.
Lexical Cohesion and Entailment based Segmentation for Arabic Text Summarization (LCEAS)
—Text summarization is the process of creating a short description of a specified text while preserving its information context. This paper tackles Arabic text summarization problem. The semantic redundancy and insignificance will be removed from the summarized text. This can be achieved by checking the text entailment relation, and lexical cohesion. Accordingly, a text summarization approach (called LCEAS) based on lexical cohesion and text entailment relation is developed. In LCEAS, text entailment approach is enhanced to suit Arabic language. Roots and semantic-relations are used between the senses of the words to extract the common words. New threshold values are specified to suit entailment based segmentation for Arabic text. LCEAS is a single document summarization, which is constructed using extraction technique. To evaluate LCEAS, its performance is compared with previous Arabic text summarization systems. Each system output is compared against Essex Arabic Summaries Corpus ...
A Proposed Method for Summarizing Arabic Single Document
International Journal of Computer Applications, 2018
This paper proposes an automatic text summarization method, which is considered as a selective process for the most important information in the original text. It could be divided into two types extractive and abstractive. In this study, a system for single documents text summarization is introduced to be used for Arabic text that rely on extractive method. According to this, we will go three stages, which are pre-processing phase, scoring of sentence, and summery generation. The pre-processing phase starts by removing punctuation marks, stop words, unifies synonyms as well as stemming words to obtain root form. Then it measures every sentence according to a collection of features in order to get the sentences with a higher score to be included in the final summary. The system has been evaluated by comparing between manual and automatic summarizations and some measurements are used especially Rouge measure. Manual summarize is done by two human experts to check the summaries' quality in terms of the general form, content, coherence of the phrases, lack of elaboration, repetition, and completeness of the meaning. The final results proved that the proposed method achieved the higher performance than other systems.