Arabic Word Stemming Algorithms and Retrieval Effectiveness (original) (raw)
Related papers
Word Stemming Algorithms and Retrieval Effectiveness in Malay and Arabic Documents Retrieval Systems
2007
Documents retrieval in Information Retrieval Systems (IRS) is generally about understanding of information in the documents concern. The more the system able to understand the contents of documents the more effective will be the retrieval outcomes. But understanding of the contents is a very complex task. Conventional IRS apply algorithms that can only approximate the meaning of document contents through keywords approach using vector space model.
Stemming techniques of Arabic Language: Comparative Study from the Information Retrieval Perspective
Arabic language considered one of the most challenging languages for solving the problem of matching in information retrieval, since it depends on both inflectional and derivational morphology, and it has a templatic morphology. Scientists found in their resent studies that using stems as index terms outperform roots. The most popular and successful technique used for producing stems of words is the light stemming techniques. Many studies have been conducted in light stemming since TREC 2002 Cross-language track. This paper aims to compare the most of the existing light stemmers in terms of main ideas, affixes lists, algorithms, and information retrieval performance. The results shows that the light10 stemmer outperformed the other stemmers in non-expanded experiments for the stemmers and Aljlayl-3 outperform them in case of expansion.
An intelligent use of stemmer and morphology analysis for Arabic information retrieval
Egyptian Informatics Journal, 2020
Arabic Information Retrieval has gained significant attention due to an increasing usage of Arabic text on the web and social media networks. This paper discusses a new approach for Arabic stem, called Arabic Morphology Information Retrieval (AMIR), to generate/extract stems by applying a set of rules regarding the relationship among Arabic letters to find the root/stem of the respective words used as indexing terms for the text search in Arabic retrieval systems. To demonstrate the usefulness of the proposed algorithm, we highlight the benefits of the proposed rules for different Arabic information retrieval systems. Finally, we have evaluated AMIR system by comparing its performance with LUCENE, FARASA, and nostemmer counterpart system in terms of mean average precisions. The results obtained demonstrate that AMIR has achieved a mean average precision of 0.34% while LUCENE, FARASA and no stemmer giving 0.27%, 0.28% and 0.21, respectively. This demonstrates that AMIR is able to improve Arabic stemmer and increases retrieval as well as being strong against any type of stem.
Word Stemming for Arabic Information Retrieval: The Case for Simple Light Stemming
2012
Although a number of attempts have been made to develop a stemming formalism for the Arabic language, most of these attempts have focused merely on the lexical structure of words as modeled by the Arabic grammatical and morphological lexical rules. This paper discusses the merits of light stemming for Arabic data and presents a simple light stemming strategy that has been developed on the basis of an analysis of actual occurrence of suffixes and prefixes in real texts. The performance of this stemming strategy has been compared with that of a heavier stemming strategy that takes into consideration most grammatical prefixes and suffixes. The results indicate that only a few of the prefixes and suffixes have an impact on the correctness of stems generated. Light stemming has exhibited superior performance than heavy stemming in terms of over-stemming and under-stemming measures. It has been shown that the two stemming strategies are significantly different in retrieval performance.
A rule-based Arabic stemming algorithm
Proceedings of the 5th European Conference on European Computing Conference, 2011
Stemming is used in information retrieval systems to reduce variant word forms to common roots in order to improve retrieval effectiveness. As in other languages, there is a need for an effective stemming algorithm for the indexing and retrieval of Arabic documents. The Arabic stemming algorithm developed by Al-Omari is studied and new versions proposed to enhance its performance. The improvements relate to the order in which the dictionary is looked-up and the order in which the morphological rules are applied.
Arabic Natural Language Processing for Information Retrieval
2004
Human Language Technology has played a big role in implementing Latin based information retrieval systems. Two of the most sited techniques are stemming and truncation. Numerous studies have showed that the inflectional structure of words has a big impact on the retrieval accuracy of Latin-based languages information retrieval systems (IRS). Stemming or truncation is done for two principal reasons: the reduction in index storage required and the increase in performance due to the use of word variants. Several stemming algorithms were proposed for stemming text such as Porter for English.
Applications of Stemming Algorithms in Information Retrieval-A Review
— Stemming is process that provides mapping of related morphological variants of words to a common stem /root form. The main purpose of stemming is to get root word of those words that are not present in dictionary/Wordnet. Stemming is very important approach for those languages that are rich in morphology. It has many application in NLP and Information Retrieval. In Information Retrieval systems stemming improves performance in terms of recall and precision. It also reduces the size of index file during indexing by conflating morphological variant to a common term/stem. In this paper different stemming Algorithms for Information retrieval and its applications in IR have been presented.
Influence of stemming on Clustering of Arabic texts: Comparative Study in Document Retrieval
International Journal of Computer Applications, 2013
Initially, this paper, sets out to study the influence of stemming on the quality of the Arabic text clustering, and then describes the testing the application of an approach based on this clustering to improve Document Retrieval (DR). A classical local document system generally, employs statistical methods for calculating the similarity between the introduced query and each document in the target collection to finally provide an ordered list of documents (hit list). In the present approach, the collection is submitted to the clustering process, and then the list of documents returned is constructed from formed clusters based on the nearest representative among the representatives of clusters compared to the user's query. The choice of the Arabic language is motivated by its very particular morpho-syntactic characteristics.
Review on Recent Arabic Information Retrieval Techniques
EAI Endorsed Transactions on Internet of Things
Information retrieval is an important field that aims to provide a relevant document to a user information need, expressed through a query. Arabic is a challenging language that gained much attention recently in the information retrieval domain. To overcome the problems related to its complexity, many studies and techniques have been presented, most of them were conducted to solve the stemming problem. This paper presents an overview of the Arabic information retrieval process, including various text processing techniques, ranking approaches, evaluation measures, and some important information retrieval models. The paper finally presents some recent related studies and approaches in different Arabic information retrieval fields.
A Rule and Template Based Stemming Algorithm for Arabic Language
Stemming is defined as the conflation of all variations of specific words to a single form called the root or stem. Stemming plays a vital role in natural language processing and understanding. As in other languages, there is a need for an effective stemming algorithm for Arabic words. Arabic is a language having a rich and complex morphological word structures and rules. An Arabic stemming algorithm based on morphological rules has been developed, and to enhance its effectiveness, a dictionary of root words is used to determine the right stems. The Arabic stemming algorithm developed by Al-Omari is studied and a new algorithm is proposed to enhance the performance. The improvements obtained relate to the order in which the dictionary is looked-up and the order in which the morphological rules are applied.