Multi-Scale Spectral Decomposition of Massive Graphs (original) (raw)

An SSD-based eigensolver for spectral analysis on billion-node graphs

arXiv (Cornell University), 2016

Many eigensolvers such as ARPACK and Anasazi have been developed to compute eigenvalues of a large sparse matrix. These eigensolvers are limited by the capacity of RAM. They run in memory of a single machine for smaller eigenvalue problems and require the distributed memory for larger problems. In contrast, we develop an SSD-based eigensolver framework called FlashEigen, which extends Anasazi eigensolvers to SSDs, to compute eigenvalues of a graph with hundreds of millions or even billions of vertices in a single machine. FlashEigen performs sparse matrix multiplication in a semi-external memory fashion, i.e., we keep the sparse matrix on SSDs and the dense matrix in memory. We store the entire vector subspace on SSDs and reduce I/O to improve performance through caching the most recent dense matrix. Our result shows that FlashEigen is able to achieve 40%-60% performance of its in-memory implementation and has performance comparable to the Anasazi eigensolvers on a machine with 48 CPU cores. Furthermore, it is capable of scaling to a graph with 3.4 billion vertices and 129 billion edges. It takes about four hours to compute eight eigenvalues of the billion-node graph using 120 GB memory.

Incremental eigenpair computation for graph Laplacian matrices: theory and applications

Social Network Analysis and Mining, 2017

The smallest eigenvalues and the associated eigenvectors (i.e., eigenpairs) of a graph Laplacian matrix have been widely used in spectral clustering and community detection. However, in real-life applications the number of clusters or communities (say, K) is generally unknown a-priori. Consequently, the majority of the existing methods either choose K heuristically or they repeat the clustering method with different choices of K and accept the best clustering result. The first option, more often, yields suboptimal result, while the second option is computationally expensive. In this work, we propose an incremental method for constructing the eigenspectrum of the graph Laplacian matrix. This method leverages the eigenstructure of graph Laplacian matrix to obtain the K-th smallest eigenpair of the Laplacian matrix given a collection of all previously computed K − 1 smallest eigenpairs. Our proposed method adapts the Laplacian matrix such that the batch eigenvalue decomposition problem transforms into an efficient sequential leading eigenpair computation problem. As a practical application, we consider user-guided spectral clustering. Specifically, we demonstrate that users can utilize the proposed incremental method for effective eigenpair computation and for determining the desired number of clusters based on multiple clustering metrics.

Spectral sparsification of graphs

Communications of the ACM, 2013

We introduce a new notion of graph sparsification based on spectral similarity of graph Laplacians: spectral sparsification requires that the Laplacian quadratic form of the sparsifier approximate that of the original. This is equivalent to saying that the Laplacian of the sparsifier is a good preconditioner for the Laplacian of the original.

iSIRA: Integrated shift–invert residual Arnoldi method for graph Laplacian matrices from big data

Journal of Computational and Applied Mathematics, 2019

The eigenvalue problem of a graph Laplacian matrix L arising from a simple, connected and undirected graph has been given more attention due to its extensive applications, such as spectral clustering, community detection, complex network, image processing and so on. The associated graph Laplacian matrix is symmetric, positive semi-definite, and is usually large and sparse. Computing some smallest positive eigenvalues and corresponding eigenvectors is often of interest. However, the singularity of L makes the classical eigensolvers inefficient since we need to factorize L for the purpose of solving large and sparse linear systems exactly. The next difficulty is that it is usually time consuming or even unavailable to factorize a large and sparse matrix arising from real network problems from big data such as social media transactional databases, and sensor systems because there is in general not only local connections. In this paper, we propose an eignsolver based on the inexact residual Arnoldi [18,19] method together with an implicit remedy of the singularity and an effective deflation for convergent eigenvalues. Numerical experiments reveal that the integrated eigensolver outperforms the classical Arnoldi/Lanczos method for computing some smallest positive eigeninformation provided the LU factorization is not available.

Partitioning networks by eigenvectors

Proceedings of the International Conference on …, 1995

A survey of published methods for partitioning sparse arrays is presented. These include early attempts to describe the partitioning properties of eigenvectors of the adjacency matrix. More direct methods of partitioning are developed by introducing the Laplacian of the adjacency matrix via the directed (signed) edge-vertex incidence matrix. It is shown that the Laplacian solves the minimization of total length of connections between adjacent nodes, which induces clustering of connected nodes by partitioning the underlying graph. Another matrix derived from the adjacency matrix is also introduced via the unsigned edge-vertex matrix. This (the Normal) matrix is not symmetric, and it also is shown to solve the minimization of total length in its own non-Euclidean metric. In this case partitions are induced by clustering the connected nodes. The Normal matrix is closely related to Correspondence Analysis.

Highly Scalable X 10 Based Spectral Clustering

2012

Large graph analysis has become a widely studied area in recent years. Clustering is one of the most important types of analysis that has versatile applications such as community detection in social networks, image segmentation, graph partitioning, etc. However, existing clustering algorithms do not intend for large scale graphs. To solve this problem, we implemented spectral clustering in X10, that is a programming language aimed for developing highly scalable applications on Post-Petascale supercomputers. Our spectral clustering is based on the algorithm proposed by Shi and Malik. We evaluated scalability and precision, and we found that our implementations are scalable in terms of execution time and precise for real data.

Large Scale Spectral Clustering Using Resistance Distance and Spielman-Teng Solvers

Lecture Notes in Computer Science, 2012

Spectral clustering is a novel clustering method which can detect complex shapes of data clusters. However, it requires the eigen decomposition of the graph Laplacian matrix, which is proportion to O(n 3) and thus is not suitable for large scale systems. Recently, many methods have been proposed to accelerate the computational time of spectral clustering. These approximate methods usually involve sampling techniques by which a lot information of the original data may be lost. In this work, we propose a fast and accurate spectral clustering approach using an approximate commute time embedding, which is similar to the spectral embedding. The method does not require using any sampling technique and computing any eigenvector at all. Instead it uses random projection and a linear time solver to find the approximate embedding. The experiments in several synthetic and real datasets show that the proposed approach has better clustering quality and is faster than the state-of-the-art approximate spectral clustering methods.

Computing the smallest eigenpairs of the graph Laplacian

The graph Laplacian, a typical representation of a network, is an important matrix that can tell us much about the network structure. In particular its eigenpairs (eigenvalues and eigenvectors) incubate precious topological information about the network at hand, including connectivity, partitioning, node distance and centrality. Real networks might be very large in number of nodes (actors); luckily, most real networks are sparse, meaning that the number of edges (binary connections among actors) are few with respect to the maximum number of possible edges. In this paper we experimentally compare three state-of-the-art algorithms for computation of a few among the smallest eigenpairs of large and sparse matrices: the Implicitly Restarted Lanczos Method, which is the current implementation in the most popular scientific computing environments (Matlab / R), the Jacobi-Davidson method, and the Deflation Accelerated Conjugate Gradient method. We implemented the algorithms in a uniform programming setting and tested them over diverse real-world networks including biological, technological, information, and social networks. It turns out that the Jacobi-Davidson method displays the best performance in terms of number of matrix-vector products and CPU time.

Spectral sparsification of graphs: theory and algorithms

We introduce a new notion of graph sparsification based on spectral similarity of graph Laplacians: spectral sparsification requires that the Laplacian quadratic form of the sparsifier approximate that of the original. This is equivalent to saying that the Laplacian of the sparsifier is a good preconditioner for the Laplacian of the original.

A split-and-merge approach for singular value decomposition of large-scale matrices

Statistics and Its Interface

We propose a new SVD algorithm based on the split-andmerge strategy, which possesses an embarrassingly parallel structure and thus can be efficiently implemented on a distributed or multicore machine. The new algorithm can also be implemented in serial for online eigen-analysis. The new algorithm is particularly suitable for big data problems: Its embarrassingly parallel structure renders it usable for feature screening, while this has been beyond the ability of the existing parallel SVD algorithms.