A Latent Variable Ranking Model for Content-Based Retrieval (original) (raw)

Learning to rank for content-based image retrieval

2010

In Content-based Image Retrieval (CBIR), accurately ranking the returned images is of paramount importance, since users consider mostly the topmost results. The typical ranking strategy used by many CBIR systems is to employ image content descriptors, so that returned images that are most similar to the query image are placed higher in the rank. While this strategy is well accepted and widely used, improved results may be obtained by combining multiple image descriptors. In this paper we explore this idea, and introduce algorithms that learn to combine information coming from different descriptors. The proposed learning to rank algorithms are based on three diverse learning techniques: Support Vector Machines (CBIR-SVM), Genetic Programming (CBIR-GP), and Association Rules (CBIR-AR). Eighteen image content descriptors (color, texture, and shape information) are used as input and provided as training to the learning algorithms. We performed a systematic evaluation involving two complex and heterogeneous image databases (Corel e Caltech) and two evaluation measures (Precision and MAP). The empirical results show that all learning algorithms provide significant gains when compared to the typical ranking strategy in which descriptors are used in isolation. We concluded that, in general, CBIR-AR and CBIR-GP outperforms CBIR-SVM. A fine-grained analysis revealed the lack of correlation between the results provided by CBIR-AR and the results provided by the other two algorithms, which indicates the opportunity of an advantageous hybrid approach.

LETOR: A benchmark collection for research on learning to rank for information retrieval

Information retrieval, 2010

LETOR is a benchmark collection for the research on learning to rank for information retrieval, released by Microsoft Research Asia. In this paper, we describe the details of the LETOR collection and show how it can be used in different kinds of researches. Specifically, we describe how the document corpora and query sets in LETOR are selected, how the documents are sampled, how the learning features and meta information are extracted, and how the datasets are partitioned for comprehensive evaluation. We then compare several state-of-the-art learning to rank algorithms on LETOR, report their ranking performances, and make discussions on the results. After that, we discuss possible new research topics that can be supported by LETOR, in addition to algorithm comparison. We hope that this paper can help people to gain deeper understanding of LETOR, and enable more interesting research projects on learning to rank and related topics.

Relevance Vector Ranking for Information Retrieval

2010

In recent years, learning ranking function for information retrieval has drawn the attentions of the researchers from information retrieval and machine learning community. In existing approaches of learning to rank, the sparse prediction model only can be learned by support vector learning approach. However, the number of support vectors grows steeply with the size of the training data set. In this paper, we propose a sparse Bayesian kernel approach to learn ranking function. By this approach accurate prediction models can be derived, which typically utilize fewer basis functions than the comparable SVM-based approaches while offering a number of additional advantages. Experimental results on document retrieval data set show that the generalization performance of this approach competitive with two state-of-the-art approaches and the prediction model learned by it is typically sparse.

Neural ranking models for document retrieval

Information Retrieval Journal

Ranking models are the main components of information retrieval systems. Several approaches to ranking are based on traditional machine learning algorithms using a set of hand-crafted features. Recently, researchers have leveraged deep learning models in information retrieval. These models are trained end-to-end to extract features from the raw data for ranking tasks, so that they overcome the limitations of hand-crafted features. A variety of deep learning models have been proposed, and each model presents a set of neural network components to extract features that are used for ranking. In this paper, we compare the proposed models in the literature along different dimensions in order to understand the major contributions and limitations of each model. In our discussion of the literature, we analyze the promising neural components, and propose future research directions. We also show the analogy between document retrieval and other retrieval tasks where the items to be ranked are s...

Direct Maximization of Rank-Based Metrics for Information Retrieval

Ranking is an essential component for a number of tasks, such as information retrieval and collaborative filtering. It is often the case that the underlying task attempts to maximize some evaluation metric, such as mean average precision, over rankings. Most past work on learning how to rank has focused on likelihood-or margin-based approaches. In this work we explore directly maximizing rank-based metrics, which are a family of metrics that only depend on the order of ranked items. This allows us to maximize different metrics for the same training data. We show how the parameter space of linear scoring functions can be reduced to a multinomial manifold. Parameter estimation is accomplished by optimizing the evaluation metric over the manifold. Results from ad hoc information retrieval are given that show our model yields significant improvements in effectiveness over other approaches.

A Deep Look into neural ranking models for information retrieval

Information Processing & Management, 2019

Ranking models lie at the heart of research on information retrieval (IR). During the past decades, different techniques have been proposed for constructing ranking models, from traditional heuristic methods, probabilistic methods, to modern machine learning methods. Recently, with the advance of deep learning technology, we have witnessed a growing body of work in applying shallow or deep neural networks to the ranking problem in IR, referred to as neural ranking models in this paper. The power of neural ranking models lies in the ability to learn from the raw text inputs for the ranking problem to avoid many limitations of hand-crafted features. Neural networks have sufficient capacity to model complicated tasks, which is needed to handle the complexity of relevance estimation in ranking. Since there have been a large variety of neural ranking models proposed, we believe it is the right time to summarize the current status, learn from existing methodologies, and gain some insights for future development. In contrast to existing reviews, in this survey, we will take a deep look into the neural ranking models from different dimensions to analyze their underlying assumptions, major design principles, and learning strategies. We compare these models through benchmark tasks to obtain a comprehensive empirical understanding of the existing techniques. We will also discuss what is missing in the current literature and what are the promising and desired future directions.

Bartell, Brian T., Cottrell Garrison W., and Richard K. Belew (1994) Learning the optimal parameters in a ranked retrieval system using multi-query relevance feedback.(244K)

A method is proposed by which parameters in ranked-output text retrieval systems can be automatically optimized to improve retrieval performance. A ranked-output text retrieval system implements a ranking function which orders documents, placing documents estimated to be more relevant to the user's query before less relevant ones. The proposed method is to adjust system parameters to maximize the match between the system's document ordering and the user's desired ordering, given by relevance feedback. The utility of the approach is demonstrated by estimating the similarity measure in a vector space model of information retrieval. The approach automatically nds a similarity measure which performs equivalent to or better than all \classic" similarity measures studied. It also performs within 1% of an estimated theoretically optimal measure.

Insights from viewing ranked retrieval as rank aggregation

2005

We view a variety of established methods for ranked retrieval from a common angle, namely as a process of combining query-independent rankings that were precomputed for certain attributes. Apart from a general insight into what effectively distinguishes various schemes from each other, we obtain three specific results concerned with conceptbased retrieval. First, we prove that latent semantic indexing (LSI) can be implemented to answer queries in time proportional to the number of words in the query, which improves over the standard implementation by an order of magnitude; a similar result is established for LSI's probabilistic sibling PLSI. Second, we give a simple and precise characterization of the extent, to which latent semantic indexing (LSI) can deal with polysems, and when it fails to do so. Third, we demonstrate that the recombination of the intricate, yet relatively cheap mechanism of PLSI for mapping queries to attributes, with a simplistic, easy-to-compute set of document rankings gives a retrieval performance which is at least as good as that of the most sophisticated conceptbased retrieval schemes.

Learning to rank relational objects and its application to web search

Proceeding of the 17th …, 2008

Learning to rank is a new statistical learning technology on creating a ranking model for sorting objects. The technology has been successfully applied to web search, and is becoming one of the key machineries for building search engines. Existing approaches to learning to rank, however, did not consider the cases in which there exists relationship between the objects to be ranked, despite of the fact that such situations are very common in practice. For example, in web search, given a query certain relationships usually exist among the the retrieved documents, e.g., URL hierarchy, similarity, etc., and sometimes it is necessary to utilize the information in ranking of the documents. This paper addresses the issue and formulates it as a novel learning problem, referred to as, 'learning to rank relational objects'. In the new learning task, the ranking model is defined as a function of not only the contents (features) of objects but also the relations between objects. The paper further focuses on one setting of the learning problem in which the way of using relation information is predetermined. It formalizes the learning task as an optimization problem in the setting. The paper then proposes a new method to perform the optimization task, particularly an implementation based on SVM. Experimental results show that the proposed method outperforms the baseline methods for two ranking tasks (Pseudo Relevance Feedback and Topic Distillation) in web search, indicating that the proposed method can indeed make effective use of relation information and content information in ranking.

[Hang Li] Learning to Rank for Information Retriev(BookFi)

Technologies is edited by Graeme Hirst of the University of Toronto. The series consists of 50-to 150-page monographs on topics relating to natural language processing, computational linguistics, information retrieval, and spoken language understanding. Emphasis is on important new techniques, on new applications, and on topics that combine two or more HLT subfields.