A Parallel Algorithm to Calculate the Costrank of a Network (original) (raw)

Network

2014

We developed analogous parallel algorithms to implement CostRank for distributed memory parallel computers using multi processors. Our intent is to make CostRank calculations for the growing number of hosts in a fast and a scalable way. In the same way we intent to secure large scale networks that require fast and reliable computing to calculate the ranking of enormous graphs with thousands of vertices (states) and millions or arcs (links). In our proposed approach we focus on a parallel CostRank computational architecture on a cluster of PCs networked via Gigabit Ethernet LAN to evaluate the performance and scalability of our implementation. In particular, a partitioning of input data, graph files, and ranking vectors with load balancing technique can improve the runtime and scalability of large-scale parallel computations. An application case study of analogous Cost Rank computation is presented. Applying parallel environment models for one-dimensional sparse matrix partitioning o...

Web-Site-Based Partitioning Techniques for Efficient Parallelization of the PageRank Computation

2008

The efficiency of the PageRank computation is important since the constantly evolving nature of the Web requires this computation to be repeated many times. PageRank computation includes repeated iterative sparse matrix-vector multiplications. Due to the enourmous size of the Web matrix to be multiplied, PageRank computations are usually carried out on parallel systems. Graph and hypergraph partitioning techniques are widely used for efficient parallelization of matrix-vector multiplications. These techniques suffer from high preprocessing overhead for PageRank algorithm. In this work, we propose Web-site-based partitioning techniques to reduce the preprocessing overhead of Parallel PageRank computation.

Site-Based Partitioning and Repartitioning Techniques for Parallel PageRank Computation

IEEE Transactions on …, 2010

The PageRank algorithm is an important component in effective web search. At the core of this algorithm are repeated sparse matrix-vector multiplications where the involved web matrices grow in parallel with the growth of the web and are stored in a distributed manner due to space limitations. Hence, the PageRank computation, which is frequently repeated, must be performed in parallel with high-efficiency and low-preprocessing overhead while considering the initial distributed nature of the web matrices. Our contributions in this work are twofold. We first investigate the application of state-of-the-art sparse matrix partitioning models in order to attain high efficiency in parallel PageRank computations with a particular focus on reducing the preprocessing overhead they introduce. For this purpose, we evaluate two different compression schemes on the web matrix using the site information inherently available in links. Second, we consider the more realistic scenario of starting with an initially distributed data and extend our algorithms to cover the repartitioning of such data for efficient PageRank computation. We report performance results using our parallelization of a state-of-the-art PageRank algorithm on two different PC clusters with 40 and 64 processors. Experiments show that the proposed techniques achieve considerably high speedups while incurring a preprocessing overhead of several iterations (for some instances even less than a single iteration) of the underlying sequential PageRank algorithm.

A web-site-based partitioning technique for reducing preprocessing overhead of parallel pagerank computation

… . State of the Art in Scientific …, 2007

A power method formulation, which efficiently handles the problem of dangling pages, is investigated for parallelization of PageRank computation. Hypergraph-partitioning-based sparse matrix partitioning methods can be successfully used for efficient parallelization. However, the preprocessing overhead due to hypergraph partitioning, which must be repeated often due to the evolving nature of the Web, is quite significant compared to the duration of the PageRank computation. To alleviate this problem, we utilize the information that sites form a natural clustering on pages to propose a site-based hypergraph-partitioning technique, which does not degrade the quality of the parallelization. We also propose an efficient parallelization scheme for matrix-vector multiplies in order to avoid possible communication due to the pages without in-links. Experimental results on realistic datasets validate the effectiveness of the proposed models.

Large-Scale Network Analysis on Distributed Architectures

2010

Questa dissertazione esamina le sfide e i limiti che gli algoritmi di analisi di grafi incontrano in architetture distribuite costituite da personal computer. In particolare, analizza il comportamento dell'algoritmo del PageRank così come implementato in una popolare libreria C++ di analisi di grafi distribuiti, la Parallel Boost Graph Library (Parallel BGL). I risultati qui presentati mostrano che il modello di programmazione parallela Bulk Synchronous Parallelè inadatto all'implementazione efficiente del PageRank su cluster costituiti da personal computer. L'implementazione analizzata ha infatti evidenziato una scalabilità negativa, il tempo di esecuzione dell'algoritmo aumenta linearmente in funzione del numero di processori. Questi risultati sono stati ottenuti lanciando l'algoritmo del PageRank della Parallel BGL su un cluster di 43 PC dual-core con 2GB di RAM l'uno, usando diversi grafi scelti in modo da facilitare l'identificazione delle variabili che influenzano la scalabilità. Grafi rappresentanti modelli diversi hanno dato risultati differenti, mostrando che c'è una relazione tra il coefficiente di clustering e l'inclinazione della retta che rappresenta il tempo in funzione del numero di processori. Ad esempio, i grafi Erdős-Rényi, aventi un basso coefficiente di clustering, hanno rappresentato il caso peggiore nei test del PageRank, mentre i grafi Small-World, aventi un alto coefficiente di clustering, hanno rappresentato il caso migliore. Anche le dimensioni del grafo hanno mostrato un'influenza sul tempo di esecuzione particolarmente interessante. Infatti, siè mostrato che la relazione tra il numero di nodi e il numero di archi determina il tempo totale. INTRODUCTION e.g., the Internet (a set of computers linked by data connections) and social networks (collections of people linked by their relationships). In fact, network analysis has many applications in fields like sociology, biology, physics, computer science, economics and operations research and often leads to interesting and useful insights. This work looks at the challenges and the limits that network analysis algorithms face in distributed architectures constituted of commodity computers. In particular, analyzes the behavior of the PageRank algorithm as it is implemented in a popular C++ distributed graph library, the Parallel Boost Graph Library. This dissertation is divided in 4 chapters. Chapter 1 introduces distributed computing. Chapter 2 presents an overview of networks and describes the algorithms to efficiently analyze them. Chapter 3 describes the PageRank algorithm as it is implemented in the Parallel Boost Graph Library. Finally, Chapter 4 presents an analysis of its limits and scalability challenges.

International Journal on Recent and Innovation Trends in Computing and Communication Parallel PageRank Algorithms: A Survey

The PageRank method is an important and basic component in effective web search to compute the rank score of each page. The exponential growth of the Internet makes a crucial challenges for search engines to provide up-to-date and relevant user's query search results within time period. The PageRank method computed on huge number of web pages and this is computation intensive task. In this paper, we provide the basic concept of PageRank method and discuss some Parallel PageRank methods. We also compare some Parallel algorithmic concepts like load balance, distributed vs. shared memory and data layout on these algorithms.

Web-Site-Based Partitioning Techniques for Reducing the Preprocessing Overhead before the Parallel PageRank Computations

… State of the Art in Scientific …, 2006

The efficiency of the PageRank computation is important since the constantly evolving nature of the Web requires this computation to be repeated many times. Due to the enormous size of the Web's hyperlink structure, PageRank computations are usually carried out on parallel computers. Recently, a hypergraph-partitioning-based formulation for parallel sparse-matrix vector multiplication is proposed as a preprocessing step which will minimize the communication overhead of the parallel PageRank computations. Based on this work, we propose Website-based partitioning approaches in order to reduce the overhead of this preprocessing step. The conducted experiments show that the proposed approach produces comparable performance results for PageRank computation while achieving lower preprocessing overheads.

An Efficient Algorithm and Its Parallelization for Computing PageRank

Lecture Notes in Computer Science, 2007

In this paper, an efficient algorithm and its parallelization to compute PageRank are proposed. There are existing algorithms to perform such tasks. However, some algorithms exclude dangling nodes which are an important part and carry important information of the web graph. In this work, we consider dangling nodes as regular web pages without changing the web graph structure and therefore fully preserve the information carried by them. This differs from some other algorithms which include dangling nodes but treat them differently from regular pages for the purpose of efficiency. We then give an efficient algorithm with negligible overhead associated with dangling node treatment. Moreover, the treatment poses little difficulty in the parallelization of the algorithm.

Towards efficient SimRank computation on large networks

2013 IEEE 29th International Conference on Data Engineering (ICDE), 2013

SimRank has been a powerful model for assessing the similarity of pairs of vertices in a graph. It is based on the concept that two vertices are similar if they are referenced by similar vertices. Due to its self-referentiality, fast SimRank computation on large graphs poses significant challenges. The state-of-the-art work [16] exploits partial sums memorization for computing SimRank in O(Kmn) time on a graph with n vertices and m edges, where K is the number of iterations. Partial sums memorizing can reduce repeated calculations by caching part of similarity summations for later reuse. However, we observe that computations among different partial sums may have duplicate redundancy. Besides, for a desired accuracy ϵ, the existing SimRank model requires K = ⌈log C ϵ⌉ iterations [16], where C is a damping factor. Nevertheless, such a geometric rate of convergence is slow in practice if a high accuracy is desirable. In this paper, we address these gaps. (1) We propose an adaptive clustering strategy to eliminate partial sums redundancy (i.e., duplicate computations occurring in partial sums), and devise an efficient algorithm for speeding up the computation of SimRank to O(Kd ′ n 2) time, where d ′ is typically much smaller than the average in-degree of a graph. (2) We also present a new notion of SimRank that is based on a differential equation and can be represented as an exponential sum of transition matrices, as opposed to the geometric sum of the conventional counterpart. This leads to a further speedup in the convergence rate of SimRank iterations. (3) Using real and synthetic data, we empirically verify that our approach of partial sums sharing outperforms the best known algorithm by up to one order of magnitude, and that our revised notion of SimRank further achieves a 5X speedup on large graphs while also fairly preserving the relative order of original SimRank scores.