Survey of Automatic Query Expansion for Arabic Text Retrieval (original) (raw)
Related papers
2013
Millions of users search daily for their needs using internet and other information stores, they search by writing their queries. Unfortunately, these queries may fail to reach to their needs, this fail known as word mismatch. One way of handling this Word mismatch is by using a thesaurus, that shows (usually semantic) the relationships between terms. The main goal of this study is to design and build an automatic Arabic thesaurus using Local Context Analysis technique that can be used in any special field or domain to improve the expansion process and to get more relevance documents for the user's query. This technique can be used in any special field or domain to improve the expansion process and to get more relevant documents for the user's query. Results of this study were compared with the classical information retrieval system. Two hundred and forty two Arabic documents and 59 Arabic queries were used for building the requirements of the thesaurus, such as inverted Fil...
Query Expansion for Arabic Information Retrieval Model: Performance Analysis and Modification
2018
Information retrieval aims to find all relevant documents responding to a query from textual data. A goodinformation retrieval system should retrieve only those documents that satisfy the user query. Although several models weredeveloped, most of Arabic information retrieval models do not satisfy the user needs. This is because the Arabic language ismore powerful and has complex morphology as well as high polysemy. This paper first investigates the most recent Arabicinformation retrieval model and then presents two different approaches to enhance the effectiveness of the adopted model.The main idea of the proposed approaches is to modify and/or expand the user query. The first approach expands user queryby using semantics of words according to an Arabic dictionary. The second approach modifies and/or expands user query byadding some useful information from the pseudo relevance feedback. In other words, the query is modified by selectingrelevant textual keywords for expanding the que...
A hybrid semantic query expansion approach for Arabic information retrieval
Journal of Big Data
Introduction Information retrieval (IR) is an active research field that aims at extraction of the most relevant documents from large datasets. User query plays an important role in this process. A numerous efforts have been done to retrieve the relevant documents which are written in English language. Nevertheless, Arabic language has not received the deserved effort due to some inherent difficulties with the language itself. In fact, Arabic language is one of the richest human languages in its terms, varieties of sentence constructions, and diversity of meaning [1]. The sentence in Arabic language is made up of interconnected terms based on grammatical relation [2-4]. User query in most cases is too short which may neither be sufficient nor effective enough to express what the user needs [2]. Vocabulary mismatch is one of the most critical issues in IR where the user and indexer use different terms [5, 6]. Consequently, IR systems could not retrieve the documents which match the user needs. A well-known and effective strategy to resolve this issue is to perform query expansion (QE).
Query Expansion Based-on Similarity of Terms for Improving Arabic Information Retrieval
IFIP Advances in Information and Communication Technology , 2012
This research suggests a method for query expansion on Arabic Information Retrieval using Expectation Maximization (EM). We employ the EM algorithm in the process of selecting relevant terms for expanding the query and weeding out the non-related terms. We tested our algorithm on INFILE test collection of CLLEF2009, and the experiments show that query expansion that considers similarity of terms both improves precision and retrieves more relevant documents. The main finding of this research is that we can increase the recall while keeping the precision at the same level by this method.
Semantic Query Expansion for Arabic Information Retrieval
Proceedings of the EMNLP 2014 Workshop on Arabic Natural Language Processing (ANLP), 2014
Traditional keyword based search is found to have some limitations. Such as word sense ambiguity, and the query intent ambiguity which can hurt the precision. Semantic search uses the contextual meaning of terms in addition to the semantic matching techniques in order to overcome these limitations. This paper introduces a query expansion approach using an ontology built from Wikipedia pages in addition to other thesaurus to improve search accuracy for Arabic language. Our approach outperformed the traditional keyword based approach in terms of both F-score and NDCG measures.
Optimizing an Arabic Query using Comprehensive Query Expansion Techniques
International journal of computer applications, 2013
By utilizing a search engine for the inquired object, the user may get what he has looked for. However, in the average; the number of words the user comes up with for a query is two or three in general [23]. This mostly causes a number of problems. To overcome such problems, various query expansion techniques have been developed. However, none of them are asserted to present the optimal solution, especially in Arabic language because its complex morphological structure. Thus, the main objective of this paper is to optimize Arabic queries using comprehensive combination of these expansion techniques that can be used to enhance the process of query expansion and to retrieve the maximum number of the relevant documents for the Arabic user's query. The paper found that the developed system improved the recall and precision over couples of separated techniques. This method gets the benefits of both expansion approaches: interactive and automatic query; because the inquired object is automatically expanded and users are discretely engaged in query expansion.
Bi-Gram Term Collocations-based Query Expansion Approach for Improving Arabic Information Retrieval
Arabian Journal for Science and Engineering, 2018
In the era of information overloading, information retrieval systems are vital applications. Many researchers try to enhance the search results by introducing new methods. Unlike the English language, some languages like Arabic have complex morphological aspects and lack both linguistic and semantic resources. This paper proposes a language-independent semanticbased information retrieval approach, which expands the user query using bi-gram term collocations. The proposed approach has two main contributions. First, the bi-gram term collocations employed to expand the user query are automatically mined from the text corpus, therefore there is no need for an external semantic resource. Second, due to the complexity of the language morphology, the system index is constructed using the corpus words to save the cost and effort of the stemming process. A system prototype for the Arabic language was implemented and evaluated versus the stem-based method. The experimental evaluation has been conducted on the scripts of the Arabic Holy Quran. The evaluation results demonstrate that the proposed system outperforms the stem-based method in terms of precision and recall.
Stem-Based Query Expansion for Arabic Corpus
ABHATH AL-YARMOUK: Basic Sciences & Engineering, 18 (2), 227-246., 2009
This paper provides an improvement to Arabic Information Retrieval Systems. The proposed system relies on the stem-based query expansion method, which adds different morphological variations to each index term used in the query. This method is applied on Arabic corpus. Roots of the query terms are derived, then for each derived root from the query words, all words in the corpus descendant from the same root are collected and classified in a distinct class. Afterward, each class is reformulated by co-occurrence ...
Query Length and its Impact on Arabic Information Retrieval Performance
2016
This paper reports the results of investigating the impact of query length on the performance of Arabic retrieval. Thirty queries were used in the investigation, each of which was phrased in three different types of length: short, medium, and longer, giving ninety different queries. A Corpus of one thousand documents on herbal medication was used and expert judgments were used to determine document relevance to each query. The main finding of this research is that using shorter queries improves both precision and recall. Due to the absence of other results to compare with and the lack of agreement on how length affects retrieval, it has been concluded that the results should be viewed in light of the type of dataset used and how queries were formulated and categorized.