Estimating PageRank deviations in crawled graphs (original) (raw)

ApproxRank: Estimating Rank for a Subgraph

2009 IEEE 25th International Conference on Data Engineering, 2009

Customized semantic query answering, personalized search, focused crawlers and localized search engines frequently focus on ranking the pages contained within a subgraph of the global Web graph. The challenge for these applications is to compute PageRank-style scores efficiently on the subgraph, i.e., the ranking must reflect the global link structure of the Web graph but it must do so without paying the high overhead associated with a global computation. We propose a framework of an exact solution and an approximate solution for computing ranking on a subgraph. The IdealRank algorithm is an exact solution with the assumption that the scores of external pages are known. We prove that the IdealRank scores for pages in the subgraph converge. Since the PageRank-style scores of external pages may not typically be available, we propose the ApproxRank algorithm to estimate scores for the subgraph. Both IdealRank and ApproxRank represent the set of external pages with an external node Λ and extend the subgraph with links to Λ. They also modify the PageRank-style transition matrix with respect to Λ. We analyze the L1 distance between IdealRank scores and ApproxRank scores of the subgraph and show that it is within a constant factor of the L1 distance of the external pages (e.g., the true PageRank scores and uniform scores assumed by ApproxRank). We compare ApproxRank and a stochastic complementation approach (SC) [1], a current best solution for this problem, on different types of subgraphs. ApproxRank has similar or superior performance to SC and typically improves on the runtime performance of SC by an order of magnitude or better. We demonstrate that ApproxRank provides a good approximation to PageRank for a variety of subgraphs.

Lumping algorithms for computing Google’s PageRank and its derivative, with attention to unreferenced nodes

Information Retrieval, 2012

In this paper, we introduce five type nodes for lumping the Web matrix, and give a unified presentation of some popular lumping methods for PageRank. We show that the PageRank problem can be reduced to solving the PageRank corresponding to the strongly non-dangling and referenced nodes, and the full PageRank vector can be easily derived by some recursion formulations. Our new lumping strategy can reduce the original PageRank problem to a much smaller one, and it is much cheaper than the recursively reordering scheme. Furthermore, we discuss sensitivity of the PageRank vector, and present a lumping algorithm for computing its first order derivative. Numerical experiments show that the new algorithms are favorable when the matrix is large and the damping factor is high.

A Study of PageRank in Undirected Graphs

2019

The PageRank (PR) algorithm is the base of Google search engine. In this paper, we study the PageRank sequence for undirected graphs of order six by PR vector. Then, we provide an ordering for graphs by variance of PR vector which it’s variation is proportional with variance of degree sequence. Finally, we introduce a relation between domination number and PR-variance of graphs.

Generalizing PageRank

Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval - SIGIR '06, 2006

This paper introduces a family of link-based ranking algorithms that propagate page importance through links. In these algorithms there is a damping function that decreases with distance, so a direct link implies more endorsement than a link through a long path. PageRank is the most widely known ranking function of this family.

Local Computation of PageRank Contributions

Internet Mathematics, 2008

Motivated by the problem of detecting link-spam, we consider the following graph-theoretic primitive: Given a webgraph G, a vertex v in G, and a parameter δ ∈ (0, 1), compute the set of all vertices that contribute to v at least a δ fraction of v's PageRank. We call this set the δ-contributing set of v. To this end, we define the contribution vector of v to be the vector whose entries measure the contributions of every vertex to the PageRank of v. A local algorithm is one that produces a solution by adaptively examining only a small portion of the input graph near a specified vertex. We give an efficient local algorithm that computes an -approximation of the contribution vector for a given vertex by adaptively examining O(1/ ) vertices. Using this algorithm, we give a local approximation algorithm for the primitive defined above. Specifically, we give an algorithm that returns a set containing the δcontributing set of v and at most O(1/δ) vertices from the δ/2-contributing set of v, and which does so by examining at most O(1/δ) vertices. We also give a local algorithm for solving the following problem: If there exist k vertices that contribute a ρ-fraction to the PageRank of v, find a set of k vertices that contribute at least a (ρ − )-fraction to the PageRank of v. In this case, we prove that our algorithm examines at most O(k/ ) vertices.

Using PageRank to Characterize Web Structure

Internet Mathematics, 2006

Recent work on modeling the Web graph has dwelt on capturing the degree distributions observed on the Web. Pointing out that this represents a heavy reliance on "local" properties of the Web graph, we study the distribution of PageRank values (used in the Google search engine) on the Web. This distribution is of independent interest in optimizing search indices and storage. We show that PageRank values on the Web follow a power law. We then develop detailed models for the Web graph that explain this observation, and moreover remain faithful to previously studied degree distributions. We analyze these models, and compare the analyses to both snapshots from the Web and to graphs generated by simulations on the new models. To our knowledge this represents the first modeling of the Web that goes beyond fitting degree distributions on the Web.

On Local Estimations of PageRank: A Mean Field Approach

Internet Mathematics, 2007

PageRank is a key element in the success of search engines, allowing to rank the most important hits in the top screen of results. One key aspect that distinguishes PageRank from other prestige measures such as in-degree is its global nature. From the information provider perspective, this makes it difficult or impossible to predict how their pages will be ranked. Consequently a market has emerged for the optimization of search engine results. Here we study the accuracy with which PageRank can be approximated by in-degree, a local measure made freely available by search engines. Theoretical and empirical analyses lead to conclude that given the weak degree correlations in the Web link graph, the approximation can be relatively accurate, giving service and information providers an effective new marketing tool.

Approximating PageRank from In-Degree

Lecture Notes in Computer Science, 2008

PageRank has become a key element in the success of search engines, allowing to rank the most important hits in the top screen of results. One key aspect that distinguishes PageRank from other prestige measures such as in-degree is its global nature. From the information provider perspective, this makes it difficult or impossible to predict how their pages will be ranked. Consequently a market has emerged for the optimization of search engine results. Here we study the accuracy with which PageRank can be approximated by in-degree, a local measure made freely available by search engines. Theoretical and empirical analyses lead to conclude that given the weak degree correlations in the Web link graph, the approximation can be relatively accurate, giving service and information providers an effective new marketing tool.

PageRank, Connecting a Line of Nodes with a Complete Graph

Springer Proceedings in Mathematics & Statistics, 2016

The focus of this article is the PageRank algorithm originally defined by S. Brin and L. Page as the stationary distribution of a certain random walk on a graph used to rank homepages on the Internet. We will attempt to get a better understanding of how PageRank changes after you make some changes to the graph such as adding or removing edge between otherwise disjoint subgraphs. In particular we will take a look at link structures consisting of a line of nodes or a complete graph where every node links to all others and different ways to combine the two. Both the ordinary normalized version of PageRank as well as a non-normalized version of PageRank found by solving corresponding linear system will be considered. We will see that it is possible to find explicit formulas for the PageRank in some simple link structures and using these formulas take a more in-depth look at the behavior of the ranking as the system changes.