Experiments with Automatic Query Formulation in the Extended Boolean Model (original) (raw)

Query Expansion Using Augmented Terms in an Extended Boolean Model

Journal of Computing Science and Engineering, 2008

We propose a new query expansion method in the extended Boolean model that improves precision without degrading recall. For improving precision, our method promotes the ranks of documents having more query terms since users typically prefer such documents. The proposed method consists of the following three steps: (1) expanding the query by adding new terms related to each term of the query, (2) further expanding the query by adding augmented terms, which are conjunctions of the terms, (3) assigning a weight on each term so that augmented terms have higher weights than the other terms. We conduct extensive experiments to show the effectiveness of the proposed method. The experimental results show that the proposed method improves precision by up to 102% for the TREC-6 data compared with the existing query expansion method using a thesaurus proposed by Kwon et al. [Kwon et al. 1994].

A New Approach for Boolean Query Processing in Text Information Retrieval

Advances in Soft Computing

The main objective of an information retrieval system is to be effective in providing a user with relevant information in response to a query. However, especially given the information explosion which has created an enormous volume of information, efficiency issues cannot be ignored. Thus, to be able to quickly process lists of documents that have the keywords stated in a given query assigned/indexed to them by merging via the Boolean logic of the query is essential in a Boolean query system. A new algorithm, based loosely on concurrent codes, is developed and discussed.

The use of phrases and structured queries in information retrieval

1991

Abstract Both phrases and Boolean queries have a long history in information retrieval, particularly in commercial systems. In previous work, Boolean queries have been used as a source of phrases for a statistical retrieval model, This work, like the majority of research on phrases, resulted in little improvement in retrieval effectiveness, In this paper, we describe an approach where phrases identified in natural language queries are used to build structured queries for a probabilistic retrieval model.

An enhanced Boolean retrieval model for efficient searching

Scientific Journal of India

A large number of information of all the domains are available online in the form of hyper text in web pages. Peoples from different domians are consulting different web sites to fetch information according to their need. It is very difficult to remember the names of the websites for a specific domain for which the user wants to search. So a search is a system which mines information from the World Wide Web and present it to the user according to its query. Information retrieval system (IRs) works for search engine arranges the web documents systematically and retrieves the result according to the user query. In this paper an efficient Boolean retrieval model is proposed which retrieves the results according to the according to the Boolean operation specified within the terms of the search query. Also the proposed model is capable to store large indexes.

Trends in research on information retrieval — The potential for improvements in conventional Boolean retrieval systems

Information Processing & Management, 1988

Operational retrieval systems are firmly embedded within the pure Boolean framework, and the theoretical model underlying these systems is based on the implicit assumption that documents and user information needs can be precisely and completely characterized by sets of index terms and Boolean search request formulations, respectively. However, this assumption must be considered grossly inaccurate since uncertainty is intrinsic to the document retrieval process. The inability of the standard Boolean model to deal effectively with the inherent fallibility of retrieval decisions is the main reason for a number of serious deficiencies exhibited by present-day operational retrieval systems. This article reviews recent advances in information retrieval research and examines their practical potential for overcoming these deficiencies. The primary source for this review is the subsequent articles that comprise this special issue of Information Processing & Management, although earlier results published elsewhere have also been considered.

IJERT-Extended Semantic based Boolean Information Retrieval Algorithm for User-driven Query

International Journal of Engineering Research and Technology (IJERT), 2015

https://www.ijert.org/extended-semantic-based-boolean-information-retrieval-algorithm-for-user-driven-query https://www.ijert.org/research/extended-semantic-based-boolean-information-retrieval-algorithm-for-user-driven-query-IJERTV4IS050514.pdf Information Retrieval (IR) is essentially a matter of deciding which documents within a large collection satisfies a user's information need. Those documents are called relevant documents and the documents that are not of the topic specified by the user are said to be non-relevant. An existing SBIR algorithm uses lexical database, WordNet to find synonyms of single-word query term considering that the absence of the given term in a document does not necessarily mean that the document is not a relevant.In this paper, a new algorithm is proposed which works with compound terms and uses modified Porter Stemming Algorithm to solve some stemming errors found in Porter Stemmer Algorithm proposed by M. F. Porter. This will improve the recall as more relevant documents will be retrieved. We propose to involve a user in the search process through interactive feedback for word senses. This will further improve recall by retrieving more user relevant results.

Efficiency of Boolean Search strings for Information Retrieval

Abstract: The review of available literature is a foundation requirement for most research projects. The relevant literatures should be searched from multiple sources. Search engines and on-line bibliography resource sites are conventionally used to find the relevant literatures using key word search. However, with little automated help for the free text search query. In this paper, the technique of Boolean search string is explored in details along with the analysis/evaluation of the effectiveness of the technique. Searching engines such as google, google scholar and online bibliography sources such as IEEE Xplore, ACM and Science Direct were used to implement the technique. The technique was evaluated based on three (3) criteria: Number of documents retrieved, the time taken to retrieve them and the relevance of the documents to the query or research question.The analysis of this technique shows that Boolean search strings technique returns more relevant articles compared to the free text query by at least 77% and in shorter time frame. Hence, Boolean search strings are very useful for information retrieval.

A mathematical model of a weighted boolean retrieval system

Information Processing and Management, 1979

The use of weights to denote a query representation and/or the indexing of a document is analysed as a generalization of a Boolean retrieval system. Criteria are given for the functions used to evaluate the relevance of the records to a specific query, including self-consistency. Various mechanisms suggested in the literature for evaluating the relevance of records with regard to a given query are tested and found to be less than satisfactory. A new approach is suggested to avoid some of the perils of a weighted Boolean retrieval system.

The Performance of Boolean Retrieval and Vector Space Model in Textual Information Retrieval

CommIT (Communication and Information Technology) Journal, 2017

Boolean Retrieval (BR) and Vector Space Model (VSM) are very popular methods in information retrieval for creating an inverted index and querying terms. BR method searches the exact results of the textual information retrieval without ranking the results. VSM method searches and ranks the results. This study empirically compares the two methods. The research utilizes a sample of the corpus data obtained from Reuters. The experimental results show that the required times to produce an inverted index by the two methods are nearly the same. However, a difference exists on the querying index. The results also show that the numberof generated indexes, the sizes of the generated files, and the duration of reading and searching an index are proportional with the file number in the corpus and thefile size.