Improving Information Retrieval Performance (original) (raw)
Related papers
Rank aggregation methods for the web
Proceedings of the 10th …, 2001
We consider the problem of combining ranking results from various sources. In the context of the Web, the main ap- plications include building meta-search engines, combining ranking functions, selecting documents based on multiple criteria, and improving search ...
Rank Aggregation for Metasearching
2010
Nowadays, mashup services and especially metasearch engines play an increasingly important role on the Web. Most of users use them directly or indirectly to access and aggregate information from more than one data sources. Similarly to the rest of the search systems, the effectiveness of a metasearch engine is mainly determined by the quality of the results it returns in response to user queries. Since these services do not maintain their own document index, they exploit multiple search engines by using a rank aggregation method in order to classify the collected results. However, the rank aggregation methods which have been proposed until now, utilize a very limited set of parameters regarding these results, such as the total number of the exploited resources and the rankings they receive from each individual resource. In this paper we present QuadRank, a new rank aggregation method, which takes into consideration additional information regarding the query terms, the collected resu...
A Novel Approach to Automatically Combine Search and Ranking Results
Ijca Proceedings on National Conference on Advancement of Technologies Information Systems Computer Networks, 2012
In the World Wide Web there are innumerable information sources containing very useful information that cannot be indexed by general-purpose search engines and hence cannot be visited by most common users. Of course, users can search a source through its query interface if they know where the source can be found. The idea of querying and collating results from multiple databases is not new. Internet metasearch engines, online catalogues, multi-databases and other kinds of information integration systems have attracted a lot of attention since the advent of the network. In the web's early days, it used to be that a search engine either presented crawler-based results or human-powered listings. Today, it is extremely common for both types of results to be presented. Usually, a hybrid search engine will favor one type of listings over another. Often the user is interested in items that are both visually and semantically similar. With a view to supporting such functionality, the hybrid search engine provides a novel retrieval method, in which both visual and ontology search is employed for the same query. This novel method automatically combines different types of search results, and complements contentbased search with ontology-based search and vice versa. In this paper, we study the rank aggregation problem in the context of the web, i.e. the problem of ranking result from various sources. There are various ranking aggregation methods available. We design an algorithm, based on which we propose a new rank aggregation method. It is observed that our proposed method is more effective and efficient than other well-known methods.
The Analysis of Rank Fusion Techniques to Improve Query Relevance
TELKOMNIKA (Telecommunication Computing Electronics and Control), 2015
Rank fusion meta-search engine algorithms can be used to merge web search results of multiple search engines. In this paper we introduce two variants of the Weighted Borda-Fuse algorithm. The first variant retrieves documents based on popularities of component engines. The second one is based on k user-defined toplist of component engines. In this research, experiments were performed on k={50,100,200} toplist with AND/OR combinations implemented on 'UNIB Meta Fusion' meta-search engine prototype which employed 3 out of 5 popular search engines. Both of our two algorithms outperformed other rank fusion algorithms (relevance score is upto 0.76 compare to Google that is 0.27, at P@10). The pseudo-relevance automatic judgement techniques involved are Reciprocal Rank, Borda Count, and Condorcet. The optimal setting was reached for queries with operator "AND" (degree 1) or "AND ... AND" (degree 2) with k=200. The 'UNIB Meta Fusion' meta-search engine system was built correctly.
Effective rank aggregation for metasearching
Journal of Systems and Software, 2011
Nowadays, mashup services and especially metasearch engines play an increasingly important role on the Web. Most of users use them directly or indirectly to access and aggregate information from more than one data sources. Similarly to the rest of the search systems, the effectiveness of a metasearch engine is mainly determined by the quality of the results it returns in response to user queries. Since these services do not maintain their own document index, they exploit multiple search engines using a rank aggregation method in order to classify the collected results. However, the rank aggregation methods which have been proposed until now, utilize a very limited set of parameters regarding these results, such as the total number of the exploited resources and the rankings they receive from each individual resource. In this paper we present QuadRank, a new rank aggregation method, which takes into consideration additional information regarding the query terms, the collected results and the data correlated to each of these results (title, textual snippet, URL, individual ranking and others). We have implemented and tested QuadRank in a real-world metasearch engine, QuadSearch, a system developed as a testbed for algorithms related to the wide problem of metasearching. The name QuadSearch is related to the current number of the exploited engines (four). We have exhaustively tested QuadRank for both effectiveness and efficiency in the real-world search environment of QuadSearch and also, using a task from the recent TREC-2009 conference. The results we present in our experiments reveal that in most cases QuadRank outperformed all component engines, another metasearch engine (Dogpile) and two successful rank aggregation methods, Borda Count and the Outranking Approach.
Analysis of Rank Aggregation techniques for Metasearch: A Case study
For surfing the internet many users rely on search engines but results are not fully effective. This gave birth to the invention of Meta-search Engines (MSEs), which merge and aggregate results from multiple search engines to derive user preferred and efficacious results. MSE takes the query from users and supply it to different search engines which in turn provide the various decisions as well as ranking of query. Hence, the cornerstone of all these processes used by MSE is directly or indirectly depends upon the merging techniques of ranking which uses Rank aggregation methods. Rank Aggregation prominence on combining of non-identical rank ordering which is applied on similar type of data set or candidates to refine the rank order. Rank Aggregation techniques are applied for numerous applications like voting, social network, metasearch under search engine performance check and selection. This paper focuses on various Rank Aggregation methods with implementation on real world dataset.
Effective Ranking Fusion Methods for Personalized Metasearch Engines
2008 Panhellenic Conference on Informatics, 2008
Metasearch engines are a significant part of the information retrieval process. Most of Web users use them directly or indirectly to access information from more than one data sources. The cornerstone of their technology is their rank aggregation method, which is the algorithm they use to classify the collected results. In this paper we present three new rank aggregation methods. At first, we propose a method that takes into consideration the regional data for the user and the pages and assigns scores according to a variety of user defined parameters. In the second expansion, not all component engines are treated equally. The user is free to define the importance of each engine by setting appropriate weights. The third algorithm is designed to classify pages having URLs that contain subdomains. The three presented methods are combined into a single, personalized scoring formula, the Global KE. All algorithms have been implemented in QuadSearch, an experimental metasearch engine available at
Insights from viewing ranked retrieval as rank aggregation
2005
We view a variety of established methods for ranked retrieval from a common angle, namely as a process of combining query-independent rankings that were precomputed for certain attributes. Apart from a general insight into what effectively distinguishes various schemes from each other, we obtain three specific results concerned with conceptbased retrieval. First, we prove that latent semantic indexing (LSI) can be implemented to answer queries in time proportional to the number of words in the query, which improves over the standard implementation by an order of magnitude; a similar result is established for LSI's probabilistic sibling PLSI. Second, we give a simple and precise characterization of the extent, to which latent semantic indexing (LSI) can deal with polysems, and when it fails to do so. Third, we demonstrate that the recombination of the intricate, yet relatively cheap mechanism of PLSI for mapping queries to attributes, with a simplistic, easy-to-compute set of document rankings gives a retrieval performance which is at least as good as that of the most sophisticated conceptbased retrieval schemes.
A Novel Approach for Organizing Web Search Results using Ranking and Clustering
International Journal of Computer …, 2010
World Wide Web is considered the most valuable place for Information Retrieval and Knowledge Discovery. While retrieving information through user queries, a search engine results in a large and unmanageable collection of documents. Web mining tools are used to classify, cluster and order the documents so that users can easily navigate through the search results and find the desired information content. A more efficient way to organize the documents can be a combination of clustering and ranking, where clustering can group the documents and ranking can be applied for ordering the pages within each cluster. Based on this approach, in this paper, a mechanism is being proposed that provides ordered results in the form of clusters in accordance with user"s query. An efficient page ranking method is also proposed that orders the results according to both the relevancy and the importance of documents. This approach helps user to restrict his search to some top documents in particular clusters of his interest.
Web Metasearch: Rank vs. Score Based Rank Aggregation Methods
2003
Given a set of rankings, the task of ranking fusion is the problem of combining these lists in such a way to optimize the performance of the combination. The ranking fusion problem is encountered in many situations and, e.g., metasearch is a prominent one. It deals with the problem of combining the result lists returned by multiple search engines in response to a given query, where each item in a result list is ordered with respect to a search engine and a relevance score. Several ranking fusion methods have been proposed in the literature. They can be classified based on whether: (i) they rely on the rank; (ii) they rely on the score; and (iii) they require training data or not. Our paper will make the following contributions: (i) we will report experimental results for the Markov chain rank based methods, for which no large experimental tests have yet been made; (ii) while it is believed that the rank based method, named Borda Count, is competitive with score based methods, we will show that this is not true for metasearch; and (iii) we will show that Markov chain based methods compete with score based methods. This is especially important in the context of metasearch as scores are usually not available from the search engines.