A New Approach for Boolean Query Processing in Text Information Retrieval (original) (raw)

An enhanced Boolean retrieval model for efficient searching

Scientific Journal of India

A large number of information of all the domains are available online in the form of hyper text in web pages. Peoples from different domians are consulting different web sites to fetch information according to their need. It is very difficult to remember the names of the websites for a specific domain for which the user wants to search. So a search is a system which mines information from the World Wide Web and present it to the user according to its query. Information retrieval system (IRs) works for search engine arranges the web documents systematically and retrieves the result according to the user query. In this paper an efficient Boolean retrieval model is proposed which retrieves the results according to the according to the Boolean operation specified within the terms of the search query. Also the proposed model is capable to store large indexes.

Efficiency of Boolean Search strings for Information Retrieval

Abstract: The review of available literature is a foundation requirement for most research projects. The relevant literatures should be searched from multiple sources. Search engines and on-line bibliography resource sites are conventionally used to find the relevant literatures using key word search. However, with little automated help for the free text search query. In this paper, the technique of Boolean search string is explored in details along with the analysis/evaluation of the effectiveness of the technique. Searching engines such as google, google scholar and online bibliography sources such as IEEE Xplore, ACM and Science Direct were used to implement the technique. The technique was evaluated based on three (3) criteria: Number of documents retrieved, the time taken to retrieve them and the relevance of the documents to the query or research question.The analysis of this technique shows that Boolean search strings technique returns more relevant articles compared to the free text query by at least 77% and in shorter time frame. Hence, Boolean search strings are very useful for information retrieval.

Trends in research on information retrieval — The potential for improvements in conventional Boolean retrieval systems

Information Processing & Management, 1988

Operational retrieval systems are firmly embedded within the pure Boolean framework, and the theoretical model underlying these systems is based on the implicit assumption that documents and user information needs can be precisely and completely characterized by sets of index terms and Boolean search request formulations, respectively. However, this assumption must be considered grossly inaccurate since uncertainty is intrinsic to the document retrieval process. The inability of the standard Boolean model to deal effectively with the inherent fallibility of retrieval decisions is the main reason for a number of serious deficiencies exhibited by present-day operational retrieval systems. This article reviews recent advances in information retrieval research and examines their practical potential for overcoming these deficiencies. The primary source for this review is the subsequent articles that comprise this special issue of Information Processing & Management, although earlier results published elsewhere have also been considered.

A mathematical model of a weighted boolean retrieval system

Information Processing and Management, 1979

The use of weights to denote a query representation and/or the indexing of a document is analysed as a generalization of a Boolean retrieval system. Criteria are given for the functions used to evaluate the relevance of the records to a specific query, including self-consistency. Various mechanisms suggested in the literature for evaluating the relevance of records with regard to a given query are tested and found to be less than satisfactory. A new approach is suggested to avoid some of the perils of a weighted Boolean retrieval system.

The Performance of Boolean Retrieval and Vector Space Model in Textual Information Retrieval

CommIT (Communication and Information Technology) Journal, 2017

Boolean Retrieval (BR) and Vector Space Model (VSM) are very popular methods in information retrieval for creating an inverted index and querying terms. BR method searches the exact results of the textual information retrieval without ranking the results. VSM method searches and ranks the results. This study empirically compares the two methods. The research utilizes a sample of the corpus data obtained from Reuters. The experimental results show that the required times to produce an inverted index by the two methods are nearly the same. However, a difference exists on the querying index. The results also show that the numberof generated indexes, the sizes of the generated files, and the duration of reading and searching an index are proportional with the file number in the corpus and thefile size.

On extending the vector space model for Boolean query processing

1986

An infamation retrieval model, named the Generaliied Vectm Spice Model (GVSM). is extended m handle situations where queries are specitied as (extended) Boolean expressions. It is shown tbat this unified model, unlike currently available alternatives, has the advantage of inwrpating tetm cortelations inm the retrieval process. 'Ilte query language extension is attractive in the sense that most of the aIgebraic properties of tbe strict Boolean language are still preserved. Although the experimental results for extended Boolean retrieval are not always better than the vector processing method, the developments here am signiecant in facilitating commercially available retrieval systems to benefit from the vector based methods. The proposed scheme is compared m the pnorm model advanced by Salmn snd coworkers. An important conclusion is that it is desirable m investigate further extensions that can offer the benefits of both proposals.

Experiments with Automatic Query Formulation in the Extended Boolean Model

Lecture Notes in Computer Science, 2009

This paper concentrates on experiments with automatic creation of queries from natural language topics, suitable for use in the Extended Boolean information retrieval system. Because of the lack and/or inadequacy of the available methods, we propose a new method, based on pairing terms into a binary tree structure. The results of this method are compared with the results achieved by our implementation of the known method proposed by Salton and also with the results obtained with manually created queries. All experiments were performed on the same collection that was used in the CLEF 2007 campaign.

IJERT-Extended Semantic based Boolean Information Retrieval Algorithm for User-driven Query

International Journal of Engineering Research and Technology (IJERT), 2015

https://www.ijert.org/extended-semantic-based-boolean-information-retrieval-algorithm-for-user-driven-query https://www.ijert.org/research/extended-semantic-based-boolean-information-retrieval-algorithm-for-user-driven-query-IJERTV4IS050514.pdf Information Retrieval (IR) is essentially a matter of deciding which documents within a large collection satisfies a user's information need. Those documents are called relevant documents and the documents that are not of the topic specified by the user are said to be non-relevant. An existing SBIR algorithm uses lexical database, WordNet to find synonyms of single-word query term considering that the absence of the given term in a document does not necessarily mean that the document is not a relevant.In this paper, a new algorithm is proposed which works with compound terms and uses modified Porter Stemming Algorithm to solve some stemming errors found in Porter Stemmer Algorithm proposed by M. F. Porter. This will improve the recall as more relevant documents will be retrieved. We propose to involve a user in the search process through interactive feedback for word senses. This will further improve recall by retrieving more user relevant results.

Extended Boolean Operations in Latent Semantic Indexing Search

2002

The paper presents method for the usage of Boolean expressions for information retrieval based on Latent Semantic Indexing (LSI). The basic binary Boolean expressions such as OR, AND and NOT(AND-NOT) and their combinations have been implemented. The proposed method adds a new functionality to the classic LSI method capabilities to process user queries typed in natural language (such as English, Bulgarian or Russian) used in the "intelligent" search engines. This gives the user the opportunity of combining not only distinct words or phrases but also whole texts (documents) using all kinds of Boolean expressions. An evaluation of the implementations has been performed using a text collection of religious and sacred texts.