Advances in information retrieval: An introduction to the special issue (original) (raw)

A Hybrid Model for Document Retrieval Systems

2022

A methodology for the design of document retrieval systems is presented. First, a composite index term weighting model is developed based on term frequency statis tics, including document frequency, relative frequency within document and relative frequency within collection, which can be adjusted by selecting various coefficients to fit into different indexing environments. Then, a composite retrieval model is pro posed to process a user's information request in a weighted Phrase-Oriented Fixed-Level Expression (POFLE), which may apply more than Boolean operators, through two phases. That is, we have a search for documents which are topically relevant to the information request by means of a descriptor matching mechanism, which incor porate a partial matching facility based on a structurally-restricted relationship imposed by indexing model, and is more general than matching functions of the tradi tional Boolean model and vector space model, and then we have a ranking of these topically relevant documents, by means of two types of heuristic-based selection rules and a knowledge-based evaluation function, in descending order of a preference score which predicts the combined effect of user preference for quality, recency, fitness and reachability of documents. v

A new type of information retrieval system

1976

In the period 1964-1968, Peter G. Ossorio ~I0,!i,12] developed and tested, on a pilot study basis, a new approach to the problem of automatic document retrieval. Ossorlo's studies were entirely successful, as pilot studies, and show the feasibility of using his approach to produce a new kind of retrieval system. These retrieval systems do not operate by word matching. The basic approach is to simulate the Judgement of competent human Judges of the conceptual content of each document, and the request. This judgement is then used to retrieve those documents with conceptual content most similar to that of the request. Each document is processed only at the time it is added to the data base, in time linear in the number of words in the document that the system recognizes. The retrieval request is in ordinary English. Time for retrieval is linear in the number of documents on file. Documents are retrieved in order of similarity of conceptual content to that of the request. The system works, in certain respects, better on full text documents, providing better descriptions of document content, and more detailed cross-indexing. are: The new type of system shows a number of interesting features. Among these (i) Much better performance "than systems using the old techniques; (2) Faithful representation of the Judgement of the person(s) whose judgement is being simulated, thus providing the possibility of indivldualized retrieval systems; (3) Ability to explain to a user why it retrieved certain documents, and not others. With this information, the user can alter his request, or instruct the system to judge things differently; (4) Automatic recognition of requests the system cannot properly handle; (5) Sub-documentary indexing reflectlng heterogeneity of material. As is often the case with a new paradigm, Ossorio's work raises at least as many questions as it answers. This paper presents the new approach, and the results of some first explorations in the new field.

A vector space model for automatic indexing

Communications of The ACM, 1975

In a document retrieval, or other pattern matching environment where stored entities (documents) are compared with each other or with incoming patterns (search requests), it appears that the best indexing (property) space is one where each entity lies as far away from the others as possible; in these circumstances the value of an indexing system may be expressible as a function of the density of the object space; in particular, retrieval performance may correlate inversely with space density. An approach based on space density computations is used to choose an optimum indexing vocabulary for a collection of documents. Typical evaluation results are shown, demonstating the usefulness of the model.

Information retrieval: an overview of system characteristics

International Journal of Medical Informatics, 1997

The paper gives an overview of characteristics of information retrieval (IR) systems. The characteristics are identified from the descriptions of 23 IR systems. Four IR models are discussed: the Boolean model, the vector model, the probabilistic model and the connectionistic model. Twelve other characteristics of IR models are identified: search intermediary, domain knowledge, relevance feedback, natural language interface, graphical query language, conceptual queries, full-text IR, field searching, fuzzy queries, hyptertext integration, machine learning, and ranked output. Finally, the relevance of IR systems for the World Wide Web is established. © 1997 Elsevier Science B.V.

Information Retrieval Research

Abstract In a relational indexing approach (see eg Farradane's work), information is carried by a fixed set of relationship types over an underlying set of terms. The idea is that the essence of the meaning of information is encapsulated in the relationships between terms. The importance of relationships is now widely recognized within many fields such as relational databases and knowledge representation formalisms.

Document Retrieval, Automatic

Encyclopedia of Language & Linguistics, 2006

Document Retrieval is the computerized process of producing a relevance ranked list of documents in response to an inquirer's request by comparing their request to an automatically produced index of the documents in the system. Everyone uses such systems today in the form of web-based search engines. While evolving from a fairly small discipline in the 1940s, to a large, profitable industry today, the field has maintained a healthy research focus, supported by test collections and large-scale annual comparative tests of systems. A document retrieval system is comprised of three core modules: document processor, query analyzer, and matching function. There are several theoretical models on which document retrieval systems are based: Boolean, Vector Space, Probabilistic, and Language Model.

Data Structures for Information Retrieval

The process of efficiently indexing large document collections for information retrieval places large demands on a computer's memory and processor, and requires judicious use of these resources. In this paper, we describe our approach to constructing such an index based on the vector-space model (VSM). We review the stages involved in generating an index, for weighting the index terms, and for representing documents in the VSM. We explain our choice of data structures from the parsing of the document collection through the generation of index terms, to generation of document representations. We explain tradeoffs in our choice of data structures. We then demonstrate the approach using the OHSUMED data set. Our results show that even with only a modest amount of main memory (4 GB), large data sets such as the OHSUMED data set can be quickly indexed.

Computational model for the processing of documents and support to the decision making in systems of information retrieval

2017

Disposing or not, of the necessary information at the right time, can mean the success or failure of any operation.. The field of information retrieval since its inception in the year 1950, has provided tools that allow users to find answers to their needs and questions. Information retrieval systems are the most used internationally, since they have interfaces and functionalities easy to understand. The main function of these systems is track the web, store the information found and then respond to user queries. Due to the large amount of information that have search engines, are a rich source of knowledge and support decision-making on information published on the web. Companies like Google do not provide concrete information of which models they use to develop the components of their search engines. In addition the calculation of the relevance of their documents responds to commercial and governmental policies, reason why it is difficult to develop systems as complex as the search engines without owning a computational model that supports the process of development of the same. The present article gives the design of a computational model for document processing and support decision-making in information retrieval systems used to design, development and deployment of searchers at national and international level.