Insights from viewing ranked retrieval as rank aggregation (original) (raw)

A Hybridized Model for Efficient Query- Dependent Ranking and Information Retrieval in Large Databases

Information Retrieval (IR) has become a topic of great interest with the advent of text search engines on the Internet. Information Retrieval (IR) is the area of study concerned with searching for documents, for information within documents, and for metadata about documents in Internet. This paper proposed a hybridized approach using KNN and VSM tools for retrieving information on the Web. The approach used standardized data to test this technique. Using the recall values of 0.1, 0.2, …, 1.0, the precision for KNN and VSM for each recall value is computed. Having obtained these values, the corresponding KNN/SVM hybridized values are then computed for ADINUL, CRANFIELD, CRN$NUL, MEDNUL, MEDLARS respectively. The performance improvement for each database collection was also computed. ADINUL is 36.4%, CRANFIELD is 47.8%, CRN4NUL is 18.4%, MEDNUL is 15.7%, and MEDLARS is 23.8%. Based on these results, it was discovered that the combined KNN/VSM retrieval models outperforms that of KNN or VSM when used separately. That is, this technique is able to retrieve information faster with significant lesser time. Thus we conclude that the hybridized KNN/VSM model is better in ranking and retrieving relevant documents than the previous techniques.

Technical report: A study of ranking paradigms and their integrations for subtopic retrieval

2010

Abstract. In this paper, we consider the problem of document ranking in a non-traditional retrieval task, called subtopic retrieval. This task involves promoting relevant documents that cover many subtopics of a query at early ranks, providing thus diversity within the ranking. In the past years, several approaches have been proposed to diversify retrieval results. These approaches can be classified into two main paradigms, depending upon how the ranks of documents are revised for promoting diversity.

When Two Is Better Than One: A Study of Ranking Paradigms and Their Integrations for Subtopic Retrieval

2010

In this paper, we consider the problem of document ranking in a non-traditional retrieval task, called subtopic retrieval. This task involves promoting relevant documents that cover many subtopics of a query at early ranks, providing thus diversity within the ranking. In the past years, several approaches have been proposed to diversify retrieval results. These approaches can be classified into two main paradigms, depending upon how the ranks of documents are revised for promoting diversity.

SemRank: ranking refinement strategy by using the semantic intensity

Procedia Computer Science, 2011

The ubiquity of the multimedia has raised a need for the system that can store, manage, structured the multimedia data in such a way that it can be retrieved intelligently. One of the current issues in media management or data mining research is ranking of retrieved documents. Ranking is one of the provocative problems for information retrieval systems. Given a user query comes up with the millions of relevant results but if the ranking function cannot rank it according to the relevancy than all results are just obsolete. However, the current ranking techniques are in the level of keyword matching. The ranking among the results is usually done by using the term frequency. This paper is concerned with ranking the document relying merely on the rich semantic inside the document instead of the contents. Our proposed ranking refinement strategy known as SemRank, rank the document based on the semantic intensity. Our approach has been applied on the open benchmark LabelMe dataset and compared against one of the well known ranking model i.e. Vector Space Model (VSM). The experimental results depicts that our approach has achieved significant improvement in retrieval performance over the state of the art ranking methods.

Improving Information Retrieval Performance

International Journal of Scientific Research in Computer Science, Engineering and Information Technology, 2022

Locating interesting information is one of the most important tasks in Information Retrieval (IR). An IR system accepts a query from a user and responds with a set of documents. Generally, the system returns both relevant and non-relevant material and a document organization approach are applied to assist the user in finding the relevant information in the retrieved set. The two most widely used document organization approaches are the ranked list and clustering of the retrieved documents. Both these techniques have their strengths and weaknesses. This paper addresses the problem of offering scalable, adaptive, efficient, full-fledged information retrieval method. We consider the problem of combining ranking results from various sources. In the context of the Web, the main applications include building meta-search engines, combining ranking functions, selecting documents based on multiple criteria, and improving search precision through word associations. We develop a set of techniques for the rank aggregation problem and compare their performance to that of well-known methods. A primary goal of our work is to design rank aggregation techniques for providing robustness of search in the context of web.

A Latent Variable Ranking Model for Content-Based Retrieval

Lecture Notes in Computer Science, 2012

Since their introduction, ranking SVM models [11] have become a powerful tool for training content-based retrieval systems. All we need for training a model are retrieval examples in the form of triplet constraints, i.e. examples specifying that relative to some query, a database item a should be ranked higher than database item b. These types of constraints could be obtained from feedback of users of the retrieval system. Most previous ranking models learn either a global combination of elementary similarity functions or a combination defined with respect to a single database item. Instead, we propose a "coarse to fine" ranking model where given a query we first compute a distribution over "coarse" classes and then use the linear combination that has been optimized for queries of that class. These coarse classes are hidden and need to be induced by the training algorithm. We propose a latent variable ranking model that induces both the latent classes and the weights of the linear combination for each class from ranking triplets. Our experiments over two large image datasets and a text retrieval dataset show the advantages of our model over learning a global combination as well as a combination for each test point (i.e. transductive setting). Furthermore, compared to the transductive approach our model has a clear computational advantages since it does not need to be retrained for each test query.

Dual-space re-ranking model for document retrieval

2010

The field of information retrieval still strives to develop models which allow semantic information to be integrated in the ranking process to improve performance in comparison to standard bag-ofwords based models. A conceptual model has been adopted in generalpurpose retrieval which can comprise a range of concepts, including linguistic terms, latent concepts and explicit knowledge concepts. One of the drawbacks of this model is that the computational cost is significant and often intractable in modern test collections. Therefore, approaches utilising conceptbased models for re-ranking initial retrieval results have attracted a considerable amount of study. This method enjoys the benefits of reduced document corpora for semantic space construction and improved ranking results. However, fitting such a model to a smaller collection is less meaningful than fitting it into the whole corpus. This paper proposes a dual-space model which incorporates external knowledge to enhance the space produced by the latent concept method. This model is intended to produce global consistency across the semantic space: similar entries are likely to have the same re-ranking scores with respect to the latent and manifest concepts. To illustrate the effectiveness of the proposed method, experiments were conducted using test collections across different languages. The results demon-strate that the method can comfortably achieve improvements in retrieval performance.

Avoidance of Ranking Capabilities in Retrieval of Queries on Hidden-Web Text Databases

Many online or local data sources provide powerful querying mechanisms but limited ranking capabilities. For instance, Pub Med allows users to submit highly expressive Boolean keyword queries, but ranks the query results by date only. However, a user would typically prefer a ranking by relevance, measured by an information retrieval (IR) ranking function. A naive approach would be to submit a disjunctive query with all query keywords, retrieve all the returned matching documents, and then re-rank them. Unfortunately, such an operation would be very expensive due to the large number of results returned by disjunctive queries. In this paper, we present algorithms that return the top results for a query, ranked according to an IR-style ranking function, while operating on top of a source with a Boolean query interface with no ranking capabilities (or a ranking capability of no interest to the end user). The algorithms generate a series of conjunctive queries that return only documents that are candidates for being highly ranked according to relevance metric. Our approach can also be applied to other settings where the ranking is monotonic on a set of factors (query keywords in IR) and the source query interface is a Boolean expression of these factors. Our comprehensive experimental evaluation on the Pub Med database and a TREC data set show that we achieve order of magnitude improvement compared to the current baseline approaches.

A Novel Approach Integrating Ranking Functions Discovery, Optimization and Inference to Improve Retrieval Performance

International Journal of Soft Computing, 2010

The significant roles play by ranking fllllction in the performance and success of Information Retrieval eIR) systems and search engines C31lllot be llllderestimated. Diverse ranking fllllctions are available in IR literature. However, empirical studies show that ranking fllllctions do not perform constantly well across different contexts (queries, collections, users). In this study, a novel three-stage integrated ranking framework is proposed for implementing discovering, optimizing and inference rankings used in IR systems. The first phase, discovery process is based on Genetic Programming (GP) approach which smartly combines structural and contents features in the docwnents while the second phase, optimization process is based on Genetic Algorithm (GA) which combines docwnentretrieval scores of various well-known ranking fllllctions. In the 3rd phase, Fuzzy inference proves as soft search constraints to be applied on docwnents. We demonstrate how these two features are combined to bring new tasks and processes within the three concept stages of integrated framework for effective IR.

Learning to rank at query-time using association rules

Proceedings of the 31st …, 2008

Some applications have to present their results in the form of ranked lists. This is the case of many information retrieval applications, in which documents must be sorted according to their relevance to a given query. This has led the interest of the information retrieval community in methods that automatically learn effective ranking functions. In this paper we propose a novel method which uncovers patterns (or rules) in the training data associating features of the document with its relevance to the query, and then uses the discovered rules to rank documents. To address typical problems that are inherent to the utilization of association rules (such as missing rules and rule explosion), the proposed method generates rules on a demand-driven basis, at query-time. The result is an extremely fast and effective ranking method. We conducted a systematic evaluation of the proposed method using the LETOR benchmark collections. We show that generating rules on a demand-driven basis can boost ranking performance, providing gains ranging from 12% to 123%, outperforming the state-of-the-art methods that learn to rank, with no need of time-consuming and laborious pre-processing. As a highlight, we also show that additional information, such as query terms, can make the generated rules more discriminative, further improving ranking performance.