A web-site-based partitioning technique for reducing preprocessing overhead of parallel pagerank computation (original) (raw)

Web-Site-Based Partitioning Techniques for Efficient Parallelization of the PageRank Computation

2008

The efficiency of the PageRank computation is important since the constantly evolving nature of the Web requires this computation to be repeated many times. PageRank computation includes repeated iterative sparse matrix-vector multiplications. Due to the enourmous size of the Web matrix to be multiplied, PageRank computations are usually carried out on parallel systems. Graph and hypergraph partitioning techniques are widely used for efficient parallelization of matrix-vector multiplications. These techniques suffer from high preprocessing overhead for PageRank algorithm. In this work, we propose Web-site-based partitioning techniques to reduce the preprocessing overhead of Parallel PageRank computation.

Web-Site-Based Partitioning Techniques for Reducing the Preprocessing Overhead before the Parallel PageRank Computations

… State of the Art in Scientific …, 2006

The efficiency of the PageRank computation is important since the constantly evolving nature of the Web requires this computation to be repeated many times. Due to the enormous size of the Web's hyperlink structure, PageRank computations are usually carried out on parallel computers. Recently, a hypergraph-partitioning-based formulation for parallel sparse-matrix vector multiplication is proposed as a preprocessing step which will minimize the communication overhead of the parallel PageRank computations. Based on this work, we propose Website-based partitioning approaches in order to reduce the overhead of this preprocessing step. The conducted experiments show that the proposed approach produces comparable performance results for PageRank computation while achieving lower preprocessing overheads.

Site-Based Partitioning and Repartitioning Techniques for Parallel PageRank Computation

IEEE Transactions on …, 2010

The PageRank algorithm is an important component in effective web search. At the core of this algorithm are repeated sparse matrix-vector multiplications where the involved web matrices grow in parallel with the growth of the web and are stored in a distributed manner due to space limitations. Hence, the PageRank computation, which is frequently repeated, must be performed in parallel with high-efficiency and low-preprocessing overhead while considering the initial distributed nature of the web matrices. Our contributions in this work are twofold. We first investigate the application of state-of-the-art sparse matrix partitioning models in order to attain high efficiency in parallel PageRank computations with a particular focus on reducing the preprocessing overhead they introduce. For this purpose, we evaluate two different compression schemes on the web matrix using the site information inherently available in links. Second, we consider the more realistic scenario of starting with an initially distributed data and extend our algorithms to cover the repartitioning of such data for efficient PageRank computation. We report performance results using our parallelization of a state-of-the-art PageRank algorithm on two different PC clusters with 40 and 64 processors. Experiments show that the proposed techniques achieve considerably high speedups while incurring a preprocessing overhead of several iterations (for some instances even less than a single iteration) of the underlying sequential PageRank algorithm.

Divide and conquer approach for efficient pagerank computation

Proceedings of the 6th international conference on Web engineering - ICWE '06, 2006

PageRank is a popular ranking metric for large graphs such as the World Wide Web. Current research techniques for improving computational efficiency of PageRank have focussed on improving the I/O cost, convergence and parallelizing the computation process. In this paper, we propose a "divide and conquer" strategy for efficient computation of PageRank. The strategy is different from contemporary improvements in that it can be combined with any existing enhancements to PageRank, giving way to an entire class of more efficient algorithms. We present a novel graph-partitioning technique for dividing the graph into subgraphs, on which computation can be performed independently. This approach has two significant benefits. Firstly, since the approach focuses on work-reduction, it can be combined with any existing enhancements to PageRank. Secondly, the proposed approach leads naturally into developing an incremental approach for computation of such ranking metrics given that these large graphs evolve over a period of time. The partitioning technique is both lossless and independent of the type (variant) of PageRank computation algorithm used. The experimental results for a static single graph (graph at a single time instance) as well as for the incremental computation in case of evolving graphs, illustrate the utility of our novel partitioning approach. The proposed approach can also be applied for the computation of any other metric based on first order Markov chain model.

An Efficient Algorithm and Its Parallelization for Computing PageRank

Lecture Notes in Computer Science, 2007

In this paper, an efficient algorithm and its parallelization to compute PageRank are proposed. There are existing algorithms to perform such tasks. However, some algorithms exclude dangling nodes which are an important part and carry important information of the web graph. In this work, we consider dangling nodes as regular web pages without changing the web graph structure and therefore fully preserve the information carried by them. This differs from some other algorithms which include dangling nodes but treat them differently from regular pages for the purpose of efficiency. We then give an efficient algorithm with negligible overhead associated with dangling node treatment. Moreover, the treatment poses little difficulty in the parallelization of the algorithm.

Exploiting Web matrix permutations to speedup PageRank computation

2004

Recently, the research community has devoted an increased attention to reduce the computational time needed by Web ranking algorithms. In particular, we saw many proposals to speed up the well-known PageRank algorithm used by Google. This interest is motivated by two dominant factors: (1) the Web Graph has huge dimensions and it is subject to dramatic updates in term of nodes and links-therefore PageRank assignment tends to became obsolete very soon; (2) many PageRank vectors need to be computed according to different personalization vectors chosen. In the present paper, we address this problem from a numerical point of view. First, we show how to treat dangling nodes in a way which naturally adapts to the random surfer model and preserves the sparsity of the Web Graph. This result allows to consider the PageRank computation as a sparse linear system in alternative to the commonly adopted eigenpairs interpretation. Second, we exploit the Web Matrix reducibility and compose opportunely some Web matrix permutation to speed up the PageRank computation. We tested our approaches on a Web Graphs crawled from the net. The largest one account about 24 millions nodes and more than 100 million links. Upon this Web Graph, the cost for computing the PageRank is reduced of 58% in terms of Mflops and of 89% in terms of time respect to the Power method commonly used.

Parallel two-stage algorithms for solving the PageRank problem

Advances in Engineering Software, 2018

In this work we present parallel algorithms based on the use of two-stage methods for solving the PageRank problem as a linear system. Different parallel versions of these methods are explored and their convergence properties are analyzed. The parallel implementation has been developed using a mixed MPI/OpenMP model to exploit parallelism beyond a single level. In order to investigate and analyze the proposed parallel algorithms, we have used several realistic large datasets. The numerical results show that the proposed algorithms can speed up the time to converge with respect to the parallel Power algorithm and behave better than other well-known techniques.

International Journal on Recent and Innovation Trends in Computing and Communication Parallel PageRank Algorithms: A Survey

The PageRank method is an important and basic component in effective web search to compute the rank score of each page. The exponential growth of the Internet makes a crucial challenges for search engines to provide up-to-date and relevant user's query search results within time period. The PageRank method computed on huge number of web pages and this is computation intensive task. In this paper, we provide the basic concept of PageRank method and discuss some Parallel PageRank methods. We also compare some Parallel algorithmic concepts like load balance, distributed vs. shared memory and data layout on these algorithms.

Accelerating PageRank computations *

Control and Cybernetics, 2011

Different methods for computing PageRank vectors are analysed. Particularly, we note the opposite behavior of the power method and the Monte Carlo method. Further, a method of reducing the number of iterations of the power method is suggested.