Novel Approaches in Text Information Retrieval - Experiments in the Web Track of TREC 2004 (original) (raw)

Bartell, Brian T., Cottrell Garrison W., and Richard K. Belew (1994) Learning the optimal parameters in a ranked retrieval system using multi-query relevance feedback.(244K)

A method is proposed by which parameters in ranked-output text retrieval systems can be automatically optimized to improve retrieval performance. A ranked-output text retrieval system implements a ranking function which orders documents, placing documents estimated to be more relevant to the user's query before less relevant ones. The proposed method is to adjust system parameters to maximize the match between the system's document ordering and the user's desired ordering, given by relevance feedback. The utility of the approach is demonstrated by estimating the similarity measure in a vector space model of information retrieval. The approach automatically nds a similarity measure which performs equivalent to or better than all \classic" similarity measures studied. It also performs within 1% of an estimated theoretically optimal measure.

Optimizing parameters in a ranked retrieval system using multi-query relevance feedback

1994

A method is proposed by which parameters in ranked-output text retrieval systems can be automatically optimized to improve retrieval performance. A ranked-output text retrieval system implements a ranking function which orders documents, placing documents estimated to be more relevant to the user's query before less relevant ones. The proposed method is to adjust system parameters to maximize the match between the system's document ordering and the user's desired ordering, given by relevance feedback. The utility of the approach is demonstrated by estimating the similarity measure in a vector space model of information retrieval. The approach automatically nds a similarity measure which performs equivalent to or better than all \classic" similarity measures studied. It also performs within 1% of an estimated theoretically optimal measure. Brian Bartell is presently with the Advanced

A multicriteria paradigm of relevance for the Web Information Retrieval problem

2005

We consider the problem of ranking Web documents within a multicriteria framework and propose a novel approach for the purpose. We focus on the design of a set of criteria aiming at capturing complementary aspects of relevance, while acknowledging that each criterion has a limited precision. Moreover, we provide algorithmic solutions to aggregate these criteria to get the ranking of relevant documents, while taking into account the specificities of the Web information retrieval problem. We report on results of preliminary experiments that give first justification of the pertinence of the proposed approach to improve retrieval effectiveness.

Combining approaches to information retrieval

2002

The combination of different text representations and search strategies has become a standard technique for improving the effectiveness of information retrieval. combination, for example, has been studied extensively in the TREC evaluations and is the basis of the ���meta-search��� engines used on the Web. This paper examines the development of this technique, including both experimental results and the retrieval models that have been proposed as formal frameworks for combination.

Evaluation in information retrieval

We have seen in the preceding chapters many alternatives in designing an IR system. How do we know which of these techniques are effective in which applications? Should we use stop lists? Should we stem? Should we use inverse document frequency weighting? Information retrieval has developed as a highly empirical discipline, requiring careful and thorough evaluation to demonstrate the superior performance of novel techniques on representative document collections. In this chapter we begin with a discussion of measuring the effectiveness of IR systems (Section 8.1) and the test collections that are most often used for this purpose (Section 8.2). We then present the straightforward notion of relevant and nonrelevant documents and the formal evaluation methodology that has been developed for evaluating unranked retrieval results (Section 8.3). This includes explaining the kinds of evaluation measures that are standardly used for document retrieval and related tasks like text classification and why they are appropriate. We then extend these notions and develop further measures for evaluating ranked retrieval results (Section 8.4) and discuss developing reliable and informative test collections (Section 8.5). We then step back to introduce the notion of user utility, and how it is approximated by the use of document relevance (Section 8.6). The key utility measure is user happiness. Speed of response and the size of the index are factors in user happiness. It seems reasonable to assume that relevance of results is the most important factor: blindingly fast, useless answers do not make a user happy. However, user perceptions do not always coincide with system designers' notions of quality. For example, user happiness commonly depends very strongly on user interface design issues, including the layout, clarity, and responsiveness of the user interface, which are independent of the quality of the results returned. We touch on other measures of the quality of a system, in particular the generation of high-quality result summary snippets, which strongly influence user utility, but are not measured in the basic relevance ranking paradigm (Section 8.7).

TREC: Experiment and Evaluation in Information Retrieval (Book Review)

The Text Retrieval Evaluation Conference (TREC), coordinated by the US National Institute of Standards and Technology (NIST), is the largest information retrieval (IR) experimentation effort in existence. Starting with TREC-1 in 1992, and continuing yearly, TREC gives participating groups the opportunity to have their IR systems compete in several IR experiments, called tracks. TREC has had a big influence on research in particular approaches to IR: tracks have often initiated small research communities around a problem, and TREC has occupied a large segment of the IR community as a whole. Thus, whatever one may think about the TREC approach to IR testing, a book detailing the methods used and results achieved (through 2003) is important. This book is a useful overview for researchers in the field, a must-read for prospective TREC participants, and a glimpse into a world of research for graduate students. The book has three parts and an epilogue. Part 1 presents the essentials. TREC is based on the Cranfield paradigm [1]. Chapter 1 quotes the oft-repeated Cranfield “conclusion” that “using words in the texts themselves was very effective” (page 3). What Cranfield actually showed, however, is that systems that compute expected relevance scores by matching query words with title words agree, to a great extent, with human judges, whose relevance judgments are heavily influenced by the match of query words with title words (hardly a surprising result). This can be seen in side studies on the nature of the relevance judgments [1]. Chapter 2 describes the test collection corpus, the creation of topics (information needs descriptions), and the relevance judgments. I would have liked to see more about the instructions given to relevance judges, and thus the nature of the relevance judgments, which are a crucial element. Chapter 3 discusses retrieval performance measures, with a focus on the monolingual English ad hoc track in TREC-1 and TREC-2. Part 2 (chapters 4 through 10) reports on the various TREC tracks, A track is a specific experiment, defined by, first, the type of task (ad hoc retrieval, filtering, question-answering, and so on); second, the type of material (printed text, spoken text, images, music, and so on); third, the presence of errors in the text (from optical character recognition (OCR) or automatic speech recognition); fourth, whether the data is monolingual or cross-lingual; and, fifth, the language(s) involved (most tracks are monolingual (English)). Each track report describes, over the life of the track, the specific task, the assembly and size of the test collection, the participants, the methods and evaluation measures used, and the results achieved (unfortunately, not including the overlap in retrieval by the different systems). In Part 3 (chapters 11 through 17), selected participants report on their work at TREC. Parts 2 and 3 give complementary views of the work at TREC. Think of a table, with a column for each track and a row for each participating group. A chapter in Part 2 reports on the total work in a column (a track), both globally and by participant (a cell in the table); a chapter in Part 3 reports on the total work in a row (a research group), both globally and by track (a cell in the table). There should be more cross-references between Parts 2 and 3 to connect information on the same table cell given in different places.