Framework for analysis and improvement of data-fusion algorithms (original) (raw)
Related papers
An effective selection of retrieval schemes for data fusion
kuwait journal of science, 2017
Merging the results from more retrieval systems/schemes may enhance the performance of the Information Retrieval system. The success of the fusion lies in the selection of the member schemes. This paper explores an effective selection algorithm, which is derived from the filter concept, by treating low-score returning schemes as noises. The proposed algorithm is tested over the three benchmark test collections namely, American Documentation Institute (ADI), Centre for Inventions and Scientific Information (CISI), and Medlars (MED). The consistency of the computed result is tested by paired student-t test. It is observed that the presented algorithm results in significant improvement over the existing combination functions. The improvement in performance of the projected method is due to the reduction in amplification chorus effect caused by the low score returning schemes.
Statistical Score Calculation of Information Retrieval Systems using Data Fusion Technique
Effective information retrieval is defined as the number of relevant documents that are retrieved with respect to user query. In this paper, we present a novel data fusion in IR to enhance the performance of the retrieval system. The best data fusion technique that unite the retrieval results of nu merous systems using various data fusion algorith ms. The study show that our approach is more efficient than traditional approaches.
Effect of Weight Assignment in Data Fusion Based Information Retrieval
2008
Abstract: Variation in performances of an Information Retrieval system, which merges results from a number of retrieval schemes possessing equal and unequal weights, is studied in this paper. Weight of the retrieval schemes for a particular document is derived from the relevance scores of that corresponding document. Since, the relevance scores are varying from document to document and corpus to corpus, the method proposed is dynamic. A number of weight calculation methods, which are using the error value for computation purpose, are discussed in this paper. The effectiveness of the weight calculation is tested over three benchmark test collections viz., ADI, CISI and MED. It has been identified that the methods discussed in this paper retrieve articles effectively and they are independent of history or any training data.
On the Selection of the Best Retrieval Result Per Query –An Alternative Approach to Data Fusion–
Lecture Notes in Computer Science, 2009
Some recent works have shown that the "perfect" selection of the best IR system per query could lead to a significant improvement on the retrieval performance. Motivated by this fact, in this paper we focus on the automatic selection of the best retrieval result from a given set of results lists generated by different IR systems. In particular, we propose five heuristic measures for evaluating the relative relevance of each result list, which take into account the redundancy and ranking of documents across the lists. Preliminary results in three different data sets, and considering 216 queries, are encouraging. They show that the proposed approach could slightly outperform the results from the best individual IR system in two out of three collections, but that it could significantly improve the average results of individual systems from all data sets. In addition, the achieved results indicate that our approach is a competitive alternative to traditional data fusion methods.
Selecting the N-Top Retrieval Result Lists for an Effective Data Fusion
Lecture Notes in Computer Science, 2010
Although the application of data fusion in information retrieval has yielded good results in the majority of the cases, it has been noticed that its achievement is dependent on the quality of the input result lists. In order to tackle this problem, in this paper we explore the combination of only the n-top result lists as an alternative to the fusion of all available data. In particular, we describe a heuristic measure based on redundancy and ranking information to evaluate the quality of each result list, and, consequently, to select the presumably n-best lists per query. Preliminary results in four IR test collections, containing a total of 266 queries, and employing three different DF methods are encouraging. They indicate that the proposed approach could significantly outperform the results achieved by fusion all available lists, showing improvements in mean average precision of 10.7%, 3.7% and 18.8% when it was used along with Maximum RSV, CombMNZ and Fuzzy Borda methods.
AUTOMATIC PERFORMANCE EVALUATION OF INFORMATION RETRIEVAL SYSTEMS USING DATA FUSION
2003
The empirical investigation of the effectiveness of information retrieval systems (search engines) requires a test collection composed of a set of documents, a set of query topics and a set of relevance judgments indicating which documents are relevant to which topics. The human relevance judgments are expensive and subjective. In addition to this databases and user interests change quickly. Hence there is a great need of automatic way of evaluating the performance of search engines.
Exploration of a geometric model of data fusion
2002
(IR) are explored using a set of data from the Fifth International Conference on Text Retrieval, TRECS. It has been observed from time to time that DF applied to a pair of systems or schemes for IR may yield results that are better than those of either participating scheme. It has been conjectured that this occurs only rarely, or occurs only when poor schemes are being combined, or occurs only for problems In which there are so few relevant documents that the results are probably due to statistical fluctuation. Based on a geometrical model of DF, we derive an equation for effective DF. This equation shows that in the ideal case the performance of a pair of IR schemes may be aproximated by a quadratic polynomial. We statistically test this assumption for TRECS Routing data. Results of the regression analysis shows that our equation for the effect of DF is generally valid.
Estimating probabilities for effective data fusion
2010
Abstract Data Fusion is the combination of a number of independent search results, relating to the same document collection, into a single result to be presented to the user. A number of probabilistic data fusion models have been shown to be effective in empirical studies. These typically attempt to estimate the probability that particular documents will be relevant, based on training data. However, little attempt has been made to gauge how the accuracy of these estimations affect fusion performance.
Toward a Robust data fusion for document retrieval
2008 International Conference on Natural Language Processing and Knowledge Engineering, 2008
This paper describes an investigation of signal boosting techniques for post-search data fusion, where the quality of the retrieval results involved in fusion may be low or diverse. The effectiveness of data fusion techniques in such situation depends on the ability of the fusion techniques to be able to boost the signals from relevant documents and reduce the effect of noise that often comes from low quality retrieval results. Our studies on Malach spoken document collection and HARD collection have demonstrated that CombMNZ, the most widely used data fusion method, does not have such ability. We, therefore, developed two versions of signal boosting mechanisms on top of CombMNZ, which result in two new fusion methods called WCombMNZ and WCombMWW. To examine the effectiveness of the two new methods, we conducted experiments on Malach and HARD document collections. Our results show that the new methods can significantly outperform CombMNZ in combining retrieval results that are low and diverse. When the tasks are to combine retrieval results that are in similar quality, which have been the scenarios that CombMNZ are applied often, the two new methods still can obtain often better, sometimes significantly, fusion results.
Information Retrieval with a Hybrid Automatic Query Expansion and Data Fusion Procedure
Information Retrieval, 2000
We propose a hybrid information retrieval (IR) procedure that builds on two well-known IR approaches: data fusion and query expansion via relevance feedback. This IR procedure is designed to exploit the strengths of data fusion and relevance feedback and to avoid some weaknesses of these approaches. We show that our IR procedure is built on postulates that can be justified analytically and empirically. Additionally, we offer an empirical investigation of the procedure, showing that it is superior to relevance feedback on some dimensions and comparable on other dimensions. The empirical investigation also verifies the conditions under which the use of our IR procedure could be beneficial.