Relevance of Genetic Algorithm Strategies in Query Optimization in Information Retrieval (original) (raw)
Related papers
Using Genetic algorithm to improve information retrieval systems
World Academy of Science, Engineering and Technology 17 2006, 2006
This study investigates the use of genetic algorithmsin information retrieval. The method is shown to be applicable tothree well-known documents collections, where more relevantdocuments are presented to users in the genetic modification. In thispaper we present a new fitness function for approximate informationretrieval which is very fast and very flexible, than cosine similarityfitness function.
Query optimization by Genetic Algorithms
Databases, Texts, Specifications, Objects, 2005
This study investigated the use of Genetic algorithms in Information retrieval in the area of optimizing a Boolean query. A query with Boolean logical operators was used in information retrieval. For Genetic algorithms, encoding chromosomes was done from Boolean query; where it was represented in the form of tree prefix with indexing for all terms and all Boolean logical operators. Information retrieval effectiveness measures precision and recall used as a fitness function in our work. Other Genetic algorithms operators were used as single point crossover on Boolean logical operators, and mutation operator was used to exchange one of the Boolean operators and, or, and xor with any other one. The goal is to retrieve most relevant documents with less number of nonrelevant documents with respect to user query in Information retrieval system using genetic programming.
Applying genetic algorithms to query optimization in document retrieval
Information Processing & Management, 2000
This paper proposes a novel approach to automatically retrieve keywords and then uses genetic algorithms to adapt the keyword weights. One of the contributions of the paper is to combine the Pat-tree-based keyword extraction for Chinese information retrieval, ACM SIGIR'97, Philadelphia, PA, US, pp. 50±59) to retrieve keywords. The approach extracts bigrams from documents and uses the bigrams to construct a PAT-tree to retrieve keywords. The proposed approach can retrieve any type of keywords such as technical keywords and a person's name. Eectiveness of the proposed approach is demonstrated by comparing how eective are the keywords found by both this approach and the PAT-tree based approach. This comparison reveals that our keyword retrieval approach is as accurate as the PAT-tree based approach, yet our approach is faster and uses less memory. The study then applies genetic algorithms to tune the weight of retrieved keywords. Moreover, several documents obtained from web sites are tested and experimental results are compared with those of other approaches, indicating that the proposed approach is highly promising for applications. 7
State-of-the-Art Review on Relevance of Genetic Algorithm to Internet Web Search
People use search engines to find information they desire with the aim that their information needs will be met. Information retrieval (IR) is a field that is concerned primarily with the searching and retrieving of information in the documents and also searching the search engine, online databases, and Internet. Genetic algorithms (GAs) are robust, efficient, and optimizated methods in a wide area of search problems motivated by Darwin's principles of natural selection and survival of the fittest. This paper describes information retrieval systems (IRS) components. This paper looks at how GAs can be applied in the field of IR and specifically the relevance of genetic algorithms to internet web search. Finally, from the proposals surveyed it turns out that GA is applied to diverse problem fields of internet web search.
THE EFFECT OF SIMILARITY MEASURES ON GENETIC ALGORITHM-BASED INFORMATION RETRIEVAL
Genetic algorithms (GAs) can be used in information retrieval (IR) to optimize the query "solution”. This paper proposes a GA-based IR algorithm that adjusts the weights of keywords of a query in order to generate an optimal or near optimal query vector. In this algorithm, each query is represented by a chromosome. These chromosomes are feed into genetic operator process: selection, crossover, and mutation to get new population, then, to get better solutions, a local search procedure is applied on each individual in the new population. This process is repeated until an optimized query chromosome for document retrieval is obtained. The evolution of the possible solutions is guided by fitness functions that are designed to measure the goodness of those solutions. We used order-based fitness function with different similarity measures to study their effect on the quality of the generated solutions and decide which similarity measure leads to the best solution.
Optimizing Information Retrieval Using Evolutionary Algorithms and Fuzzy Inference System
2009
With the rapid growth of the amount of data available in electronic libraries, through Internet and enterprise network mediums, advanced methods of search and information retrieval are in demand. Information retrieval systems, designed for storing, maintaining and searching large-scale sets of unstructured documents, are the subject of intensive investigation. An information retrieval system, a sophisticated application managing underlying documentary databases, is at the core of every search engine, including Internet search services. There is a clear demand for fine-tuning the performance of information retrieval systems. One step in optimizing the information retrieval experience is the deployment of Genetic Algorithms, a widely used subclass of Evolutionary Algorithms that have proved to be a successful optimization tool in many areas. In this paper, we revise and extend genetic approaches to information retrieval leverage via the optimization of search queries. As the next trend in improving search effectiveness and user-friendliness, system interaction will use fuzzy concepts in information retrieval systems. Deployment of fuzzy technology allows stating flexible, smooth and vague search criteria and retrieving a rich set of relevance ranked documents aiming to supply the inquirer with more satisfactory answers.
Effective Information Retrieval Using Genetic Algorithms Based Matching Functions Adaptation
Proceedings of the 33rd Annual Hawaii International …, 2000
Knowledge intensive organizations have vast array of information contained in large document repositories. With the advent of E-commerce and corporate intranets/extranets, these repositories are expected to grow at a fast pace. This explosive growth has led to huge, fragmented, and unstructured document collections. Although it has become easier to collect and store information in document collections, it has become increasingly difficult to retrieve relevant information from these large document collections. This paper addresses the issue of improving retrieval performance (in terms of precision and recall) for retrieval from document collections. There are three important paradigms of research in the area of information retrieval (IR): Probabilistic IR, Knowledge-based IR, and, Artificial Intelligence based techniques like neural networks and symbolic learning. Very few researchers have tried to use evolutionary algorithms like genetic algorithms (GA's). Previous attempts at using GA's have concentrated on modifying document representations or modifying query representations. This work looks at the possibility of applying GA's to adapt various matching functions. It is hoped that such an adaptation of the matching functions will lead to a better retrieval performance than that obtained by using a single matching function. An overall matching function is treated as a weighted combination of scores produced by individual matching functions. This overall score is used to rank and retrieve documents. Weights associated with individual functions are searched using Genetic Algorithm. The idea is tested on a real document collection called the Cranfield collection. The results look very encouraging
Increasing the Visibility of Search using Genetic Algorithm
The vast repository of informational databases i.e. Web is available to the user in the form of textual documents. It's a challenge to develop an effective information retrieval approach that can ease the user search and increases the visibility of search. Genetic Algorithm based approach has been implemented to increase the visibility of search by expanding the query using Jaccard similarity function as fitness function. The step by step implementation of genetic algorithm for one generation has been explained in the paper and the experiment was repeated for 500 generations to obtain optimum keywords out of which the best suited keyword was considered for expanding the query. The effectiveness of the approach has been experimentally evaluated on manually created training data of retrieved documents for formulated queries using the Google search engine.
Avoiding Premature Convergence of Genetic Algorithm in Informational Retrieval Systems
Genetic algorithm is been adopted to implement information retrieval systems by many researchers to retrieve optimal document set based on user query. However, GA is been critiqued by premature convergence due to falling into local optimal solution. This paper proposes a new hybrid crossover technique that speeds up the convergence while preserving high quality of the retrieved documents. The proposed technique is applied to HTML documents and evaluated using precision measure. The results show that this technique is efficient in balancing between fast convergence and high quality outcome.