Optimizing Information Retrieval Using Evolutionary Algorithms and Fuzzy Inference System (original) (raw)

Relevance of Genetic Algorithm Strategies in Query Optimization in Information Retrieval

2014

The augmentation of digital information on the Web has proliferated informational needs and expectations of the seekers, resulting in insistent need of more advanced search tools, that are able to respond to the informational requirements within an organization. The user may formulate a search query in a way that can obscure the useful documents to be retrieved. The objective of query optimization is to transform the query into an effective form to improve the quality of recovered information and to reduce the computational burden in processing document text at query time. Genetic algorithms are efficient and robust methods, employed widely in optimization of a variety of search problems, motivated by Darwin’s principles of natural selection and survival of the fittest. This paper reviews relevance of genetic algorithms to improve upon the user queries in the field of Information Retrieval.

Using Genetic algorithm to improve information retrieval systems

World Academy of Science, Engineering and Technology 17 2006, 2006

This study investigates the use of genetic algorithmsin information retrieval. The method is shown to be applicable tothree well-known documents collections, where more relevantdocuments are presented to users in the genetic modification. In thispaper we present a new fitness function for approximate informationretrieval which is very fast and very flexible, than cosine similarityfitness function.

A fuzzy genetic algorithm approach to an adaptive information retrieval agent

Journal of the American Society for Information Science, 1999

We present an approach to a Genetic Information Retrieval Agent Filter (GIRAF) for documents from the Internet using a genetic algorithm (GA) with fuzzy set genes to learn the user's information needs. The population of chromosomes with fixed length represents such user's preferences. Each chromosome is associated with a fitness that may be considered the system's belief in the hypothesis that the chromosome, as a query, represents the user's information needs. In a chromosome, every gene characterizes documents by a keyword and an associated occurrence frequency, represented by a certain type of a fuzzy subset of the set of positive integers. Based on the user's evaluation of the documents retrieved by the chromosome, compared to the scores computed by the system, the fitness of the chromosomes is adjusted. A prototype of GIRAF has been developed and tested. The results of the test are discussed, and some directions for further works are pointed out.

A new evolutionary algorithm combining simulated annealing and genetic programming for relevance feedback in fuzzy information retrieval systems

Soft Computing-A Fusion of Foundations, …, 2002

Relevance feedback techniques have demonstrated to be a powerful means to improve the results obtained when a user submits a query to an information retrieval system as the world wide web search engines. These kinds of techniques modify the user original query taking into account the relevance judgements provided by him on the retrieved documents, making it more similar to those he judged as relevant. This way, the new generated query permits to get new relevant documents thus improving the retrieval process by increasing recall. However, although powerful relevance feedback techniques have been developed for the vector space information retrieval model and some of them have been translated to the classical Boolean model, there is a lack of these tools in more advanced and powerful information retrieval models such as the fuzzy one. In this contribution we introduce a relevance feedback process for extended Boolean (fuzzy) information retrieval systems based on a hybrid evolutionary algorithm combining simulated annealing and genetic programming components. The performance of the proposed technique will be compared with the only previous existing approach to perform this task, Kraft et al.'s method, showing how our proposal outperforms the latter in terms of accuracy and sometimes also in time consumption. Moreover, it will be showed how the adaptation of the retrieval threshold by the relevance feedback mechanism allows the system effectiveness to be increased.

State-of-the-Art Review on Relevance of Genetic Algorithm to Internet Web Search

People use search engines to find information they desire with the aim that their information needs will be met. Information retrieval (IR) is a field that is concerned primarily with the searching and retrieving of information in the documents and also searching the search engine, online databases, and Internet. Genetic algorithms (GAs) are robust, efficient, and optimizated methods in a wide area of search problems motivated by Darwin's principles of natural selection and survival of the fittest. This paper describes information retrieval systems (IRS) components. This paper looks at how GAs can be applied in the field of IR and specifically the relevance of genetic algorithms to internet web search. Finally, from the proposals surveyed it turns out that GA is applied to diverse problem fields of internet web search.

THE EFFECT OF SIMILARITY MEASURES ON GENETIC ALGORITHM-BASED INFORMATION RETRIEVAL

Genetic algorithms (GAs) can be used in information retrieval (IR) to optimize the query "solution”. This paper proposes a GA-based IR algorithm that adjusts the weights of keywords of a query in order to generate an optimal or near optimal query vector. In this algorithm, each query is represented by a chromosome. These chromosomes are feed into genetic operator process: selection, crossover, and mutation to get new population, then, to get better solutions, a local search procedure is applied on each individual in the new population. This process is repeated until an optimized query chromosome for document retrieval is obtained. The evolution of the possible solutions is guided by fitness functions that are designed to measure the goodness of those solutions. We used order-based fitness function with different similarity measures to study their effect on the quality of the generated solutions and decide which similarity measure leads to the best solution.

IMPROVING THE EFFECTIVENESS OF INFORMATION RETRIEVAL SYSTEM USING ADAPTIVE GENETIC ALGORITHM

Traditional Genetic Algorithm which is used in previous studies depends on fixed control parameters especially crossover and mutation probabilities, but in this research we tried to use adaptive genetic algorithm. Genetic algorithm started to be applied in information retrieval system in order to optimize the query by genetic algorithm, a good query is a set of terms that express accurately the information need while being usable within collection corpus, the last part of this specification is critical for the matching process to be efficient, that is why most research efforts are actually put toward the query improvement. We investigated the use of adaptive genetic algorithm (AGA) under vector space model, Extended Boolean model, and Language model in information retrieval (IR), the algorithm used crossover and mutation operators with variable probability, where a traditional genetic algorithm (GA) uses fixed values of those, and remain unchanged during execution. GA is developed to support adaptive adjustment of mutation and crossover probability; this allows faster attainment of better solutions. The paper has been tested using 242 Arabic abstracts collected from the proceedings of the Saudi Arabian National conference.

Fuzzy genes: improving the effectiveness of information retrieval

Proceedings of the 2000 Congress on Evolutionary Computation. CEC00 (Cat. No.00TH8512), 2000

In this paper, the improvement of the effectiveness of Information Retrieval by using Genetic Algorithms (GAS) and Fuzzy Logic is demonstrated. A new classification of the Information Retrieval models in the framework of the GAS is given. Such classification is based on the target of the fitness function selected. When the aim of the optimization is the document classification, we deal with Document-Oriented Models. On the contrary, the Term-Oriented Models try to find those terms more discriminatory and adequate to the user preferences to build a profile. A new scheme of weighting based on Fuzzy Logic is presented for the first class of models. A comparison with other classical weighting schemes, as well as a study of the best aggregation operators of the local fitness in the genes to the overall fitness per chromosome, is also presented. The deeper study of this new scheme in the Term-Oriented Models is the main objective for the future work.

Avoiding Premature Convergence of Genetic Algorithm in Informational Retrieval Systems

Genetic algorithm is been adopted to implement information retrieval systems by many researchers to retrieve optimal document set based on user query. However, GA is been critiqued by premature convergence due to falling into local optimal solution. This paper proposes a new hybrid crossover technique that speeds up the convergence while preserving high quality of the retrieved documents. The proposed technique is applied to HTML documents and evaluated using precision measure. The results show that this technique is efficient in balancing between fast convergence and high quality outcome.