Applying Completely-Arbitrary Passage for Pseudo-Relevance Feedback in Language Modeling Approach (original) (raw)

Relevant query feedback in statistical language modeling

2003

Abstract In traditional relevance feedback, researchers have explored relevant document feedback, wherein, the query representation is updated based on a set of relevant documents returned by the user. In this work, we investigate relevant query feedback, in which we update a document's representation based on a set of relevant queries. We propose four statistical models to incorporate relevant query feedback.

Effective pseudo-relevance feedback for spoken document retrieval

2013 IEEE International Conference on Acoustics, Speech and Signal Processing, 2013

With the exponential proliferation of multimedia associated with spoken documents, research on spoken document retrieval (SDR) has emerged and attracted much attention in the past two decades. Apart from much effort devoted to developing robust indexing and modeling techniques for representing spoken documents, a recent line of thought targets at the improvement of query modeling for better reflecting the user's information need. Pseudo-relevance feedback is by far the most commonly-used paradigm for query reformulation, which assumes that a small amount of top-ranked feedback documents obtained from the initial round of retrieval are relevant and can be utilized for this purpose. Nevertheless, simply taking all of the top-ranked feedback documents obtained from the initial retrieval for query modeling (reformulation) does not always work well, especially when the top-ranked documents contain much redundant or non-relevant information. In the view of this, we explore in this paper an interesting problem of how to effectively glean useful cues from the top-ranked documents so as to achieve more accurate query modeling. To do this, different kinds of information cues are considered and integrated into the process of feedback document selection so as to improve query effectiveness. Experiments conducted on the TDT (Topic Detection and Tracking) task show the advantages of our retrieval methods for SDR.

Improving the robustness of relevance-based language models

2005

ABSTRACT We propose a new robust relevance model that can be applied to both pseudo feedback and true relevance feedback in the language-modeling framework for document retrieval. There are three main differences between our new relevance model and the Lavrenko-Croft relevance model.

Selecting good expansion terms for pseudo-relevance feedback

2008

Pseudo-relevance feedback assumes that most frequent terms in the pseudo-feedback documents are useful for the retrieval. In this study, we re-examine this assumption and show that it does not hold in realitymany expansion terms identified in traditional approaches are indeed unrelated to the query and harmful to the retrieval. We also show that good expansion terms cannot be distinguished from bad ones merely on their distributions in the feedback documents and in the whole collection. We then propose to integrate a term classification process to predict the usefulness of expansion terms. Multiple additional features can be integrated in this process. Our experiments on three TREC collections show that retrieval effectiveness can be much improved when term classification is used. In addition, we also demonstrate that good terms should be identified directly according to their possible impact on the retrieval effectiveness, i.e. using supervised learning, instead of unsupervised learning.

Exploring Sentence level Query Expansion in the Language Model

We introduce two novel methods for query expansion in information retrieval (IR). The basis of these methods is to add the most similar sentences extracted from pseudo-relevant documents to the original query. The first method adds a fixed number of sentences to the original query, the second a progressively decreasing number of sentences. We evaluate these methods on the English and Bengali test collections from the FIRE workshops. The major findings of this study are that: i) performance is similar for both English and Bengali; ii) employing a smaller context (similar sentences) yields a considerably higher mean average precision (MAP) compared to extracting terms from full documents (up to 5.9% improvemnent in MAP for English and 10.7% for Bengali compared to standard Blind Relevance Feedback (BRF); iii) using a variable number of sentences for query expansion performs better and shows less variance in the best MAP for different parameter settings; iv) query expansion based on sentences can improve performance even for topics with low initial retrieval precision where standard BRF fails.

Exploring Sentence Level Query Expansion in Language Modeling Based Information Retrieval

We introduce two novel methods for query expansion in information retrieval (IR). The basis of these methods is to add the most similar sentences extracted from pseudo-relevant documents to the original query. The first method adds a fixed number of sentences to the original query, the second a progressively decreasing number of sentences. We evaluate these methods on the English and Bengali test collections from the FIRE workshops. The major findings of this study are that: i) performance is similar for both English and Bengali; ii) employing a smaller context (similar sentences) yields a considerably higher mean average precision (MAP) compared to extracting terms from full documents (up to 5.9% improvement in MAP for English and 10.7% for Bengali compared to standard Blind Relevance Feedback (BRF); iii) using a variable number of sentences for query expansion performs better and shows less variance in the best MAP for different parameter settings; iv) query expansion based on sentences can improve performance even for topics with low initial retrieval precision where standard BRF fails.

A document frequency constraint for pseudo-relevance feedback models

ABSTRACT. We study in this paper the behavior of several PRF models, and display their main characteristics. This will lead us to introduce a new heuristic constraint for PRF models, referred to as the Document Frequency (DF) constraint. We then analyze, from a theoretical point of view, state-of-the-art PRF models according to their relation with this constraint. This analysis reveals that the standard mixture model for PRF in the language modeling family does not satisfy the DF constraint.

Exploring Accumulative Query Expansion for Relevance Feedback

For the participation of Dublin City University (DCU) in the Relevance Feedback (RF) track of INEX 2010, we investigated the relation between the length of relevant text passages and the number of RF terms. In our experiments, relevant passages are segmented into non-overlapping windows of fi xed length which are sorted by similarity with the query. In each retrieval iteration, we extend the current query with the most frequent terms extracted from these word windows. The number of feedback terms corresponds to a constant number, a number proportional to the length of relevant passages, and a number inversely proportional to the length of relevant passages, respectively. Retrieval experiments show a signifi cant increase in MAP for INEX 2008 training data and improved precisions at early recall levels for the 2010 topics as compared to the baseline Rocchio feedback.

Effective pseudo-relevance feedback for language modeling in extractive speech summarization

2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2014

Extractive speech summarization, aiming to automatically select an indicative set of sentences from a spoken document so as to concisely represent the most important aspects of the document, has become an active area for research and experimentation. An emerging stream of work is to employ the language modeling (LM) framework along with the Kullback-Leibler divergence measure for extractive speech summarization, which can perform important sentence selection in an unsupervised manner and has shown preliminary success. This paper presents a continuation of such a general line of research and its main contribution is two-fold. First, by virtue of pseudo-relevance feedback, we explore several effective sentence modeling formulations to enhance the sentence models involved in the LM-based summarization framework. Second, the utilities of our summarization methods and several widely-used methods are analyzed and compared extensively, which demonstrates the effectiveness of our methods.