The use of phrases and structured queries in information retrieval (original) (raw)

Trends in research on information retrieval — The potential for improvements in conventional Boolean retrieval systems

Information Processing & Management, 1988

Operational retrieval systems are firmly embedded within the pure Boolean framework, and the theoretical model underlying these systems is based on the implicit assumption that documents and user information needs can be precisely and completely characterized by sets of index terms and Boolean search request formulations, respectively. However, this assumption must be considered grossly inaccurate since uncertainty is intrinsic to the document retrieval process. The inability of the standard Boolean model to deal effectively with the inherent fallibility of retrieval decisions is the main reason for a number of serious deficiencies exhibited by present-day operational retrieval systems. This article reviews recent advances in information retrieval research and examines their practical potential for overcoming these deficiencies. The primary source for this review is the subsequent articles that comprise this special issue of Information Processing & Management, although earlier results published elsewhere have also been considered.

Using Structured Queries for Keyword Information Retrieval

An increasingly important class of keyword search tasks are those where users are looking for a specific piece of information buried within a few documents in a large collection. Examples include searching for someone's phone number, the schedule for a meeting, or a package tracking URL, within a personal email collection. We refer to such tasks as "precision-oriented search tasks". While modern information extraction techniques can be used to extract the concepts involved in these tasks (persons, phone numbers, schedules, etc.), since users only provide keywords as input, the problem of identifying the documents that contain the information of interest remains a challenge.

A syntactically-based query reformulation technique for information retrieval

2008

Whereas in language words of high frequency are generally associated with low content [Bookstein, A., & Swanson, D.(1974). Probabilistic models for automatic indexing. Journal of the American Society of Information Science, 25 (5), 312–318; Damerau, FJ (1965). An experiment in automatic indexing. American Documentation, 16, 283–289; Harter, SP (1974). A probabilistic approach to automatic keyword indexing. PhD thesis, University of Chicago; Sparck-Jones, K.(1972).

An exploratory analysis of phrases in text retrieval

2000

Phrases are used in both commercial and experimental search engines. Despite the large amount of work in the area results remain mixed. It is still not clear whether phrases can be used to improve retrieval effectiveness. In this paper, we examine phrases and their properties independently of any specific retrieval approach. We explore phrase usage in text corpora and relevance patterns related to phrase usage. The result is not only a better understanding of phrases, but a better method by which phrases and phrase techniques may be evaluated. With this method we can directly determine the value of various phrase formulations for information retrieval.

Experiments with Automatic Query Formulation in the Extended Boolean Model

Lecture Notes in Computer Science, 2009

This paper concentrates on experiments with automatic creation of queries from natural language topics, suitable for use in the Extended Boolean information retrieval system. Because of the lack and/or inadequacy of the available methods, we propose a new method, based on pairing terms into a binary tree structure. The results of this method are compared with the results achieved by our implementation of the known method proposed by Salton and also with the results obtained with manually created queries. All experiments were performed on the same collection that was used in the CLEF 2007 campaign.

Classical and Probabilistic Information Retrieval Techniques: An Audit

Lahore Garrison University Research Journal of Computer Science and Information Technology, 2021

Information retrieval is acquiring particular information from large resources and presenting it according to the user’s need. The incredible increase in information resources on the Internet formulates the information retrieval procedure, a monotonous and complicated task for users. Due to over access of information, better methodology is required to retrieve the most appropriate information from different sources. The most important information retrieval methods include the probabilistic, fuzzy set, vector space, and boolean models. Each of these models usually are used for evaluating the connection between the question and the retrievable documents. These methods are based on the keyword and use lists of keywords to evaluate the information material. In this paper, we present a survey of these models so that their working methodology and limitations are discussed. This is an important understanding because it makes possible to select an information retrieval technique based on th...

Exploiting Query Structure and Document Structure to Improve Document Retrieval Effectiveness ABSTRACT

In this paper we present a systematic analysis of document retrieval using unstructured and structured queries within the score region algebra (SRA) structured retrieval framework. The behavior of different retrieval models, namely Boolean, tf.idf, GPX, language models, and Okapi, is tested using the transparent SRA framework in our three-level structured retrieval system called TIJAH. The retrieval models are implemented along four elementary retrieval aspects: element and term selection, element score computation, score combination, and score propagation.

A Survey of Information Retrieval Techniques

Advances in Networks

The explosive growth of resources stored in various forms and transmitted over the internet has necessitated researches into information retrieval technologies. The major information retrieval mechanisms commonly employed include vector space model, Boolean model, Fuzzy Set model, and probabilistic retrieval model. These models are used to find similarities between the query and the documents to retrieve documents that reflect the query. These approaches are based on keyword , which uses lists of keywords to describe the information content. In this paper, a survey of these models is provided in order to understand their working mechanisms and shortcomings. This understanding is vital as it facilitates the choice of an information retrieval technique, based on the underlying requirements. The results of this survey revealed that the current information retrieval models fall short of the expectations in one way or the other. As such, they are not ideal for high precision information retrieval applications.

A structured indexing model based on noun phrases

… , Innovation and Vision …, 2006

Most of the indexing models are based on simple independent words, also known as key words. This approach does not take account of the context as well as the relations between the words. Therefore, the precision of system is limited. In this article, we present a structured indexing model based on noun phrases to increase the precision of an Information Retrieval System (IRS). In this model, we used a grammatical parser to extract and structure a noun phrase in determining the various roles of the words of a noun phrase and their syntactic relations. We represent the set of the index terms of query in the form of Bayesian networks which enables us to calculate the matching function between a query and a document. We carried out experiments to test this model. That the positive results obtained encourages us to continue in this direction.