Large Scale Spectral Clustering Using Resistance Distance and Spielman-Teng Solvers (original) (raw)

Spectral Clustering

Studies in big data, 2017

This chapter discusses clustering methods based on similarities between pairs of objects. Such a knowledge does not imply that the entire objects are embedded in a metric space. Instead, the local knowledge supports a graphical representation displaying relationships among the objects from a given data set. The problem of data clustering transforms then into the problem of graph partitioning, and this partitioning is acquired by analysing eigenvectors of the graph Laplacian, a basic tool used in spectral graph theory. We explain how various forms of graph Laplacian are used in various graph partitioning criteria, and how these translate into particular algorithms. There is a strong and fascinating relationship between graph Laplacian and random walk on a graph. Particularly, it allows to formulate a number of other clustering criteria, and to formulate another data clustering algorithms. We briefly review these problems. It should be noted that the eigenvectors deliver so-called spectral representation of data items. Unfortunately, this representation is fixed for a given data set, and adding or deleting some items destroys it. Thus we discuss recently invented methods of out of sample spectral clustering allowing to overcome this disadvantage. Although spectral methods are successful in extracting non-convex groups in data, the process of forming graph Laplacian is memory consuming and computing its eigenvectors is time consuming. Thus we discuss various local methods in which only relevant part of the graph are considered. Moreover, we mention a number of methods allowing fast and approximate computation of the eigenvectors.

A tutorial on spectral clustering

In recent years, spectral clustering has become one of the most popular modern clustering algorithms. It is simple to implement, can be solved efficiently by standard linear algebra software, and very often outperforms traditional clustering algorithms such as the k-means algorithm. On the first glance spectral clustering appears slightly mysterious , and it is not obvious to see why it works at all and what it really does. The goal of this tutorial is to give some intuition on those questions. We describe different graph Laplacians and their basic properties, present the most common spectral clustering algorithms, and derive those algorithms from scratch by several different approaches. Advantages and disadvantages of the different spectral clustering algorithms are discussed.

Fast Spectral Clustering via the Nyström Method

Lecture Notes in Computer Science, 2013

We propose and analyze a fast spectral clustering algorithm with computational complexity linear in the number of data points that is directly applicable to large-scale datasets. The algorithm combines two powerful techniques in machine learning: spectral clustering algorithms and Nyström methods commonly used to obtain good quality low rank approximations of large matrices. The proposed algorithm applies the Nyström approximation to the graph Laplacian to perform clustering. We provide theoretical analysis of the performance of the algorithm and show the error bound it achieves and we discuss the conditions under which the algorithm performance is comparable to spectral clustering with the original graph Laplacian. We also present empirical results.

Dimensionality Reduction for Spectral Clustering

2011

Spectral clustering is a flexible clustering methodology that is applicable to a variety of data types and has the particular virtue that it makes few assumptions on cluster shapes. It has become popular in a variety of application areas, particularly in computational vision and bioinformatics. The approach appears, however, to be particularly sensitive to irrelevant and noisy dimensions in the data. We thus introduce an approach that automatically learns the relevant dimensions and spectral clustering simultaneously. We pursue an augmented form of spectral clustering in which an explicit projection operator is incorporated in the relaxed optimization functional. We optimize this functional over both the projection and the spectral embedding. Experiments on simulated and real data show that this approach yields significant improvements in the performance of spectral clustering.

Spectral Clustering Based on Analysis of Eigenvector Properties

Lecture Notes in Computer Science, 2014

In this paper we propose a new method for choosing the number of clusters and the most appropriate eigenvectors, that allow to obtain the optimal clustering. To accomplish the task we suggest to examine carefully properties of adjacency matrix eigenvectors: their weak localization as well as the sign of their values. The algorithm has only one parameter-the number of mutual neighbors. We compare our method to several clustering solutions using different types of datasets. The experiments demonstrate that our method outperforms in most cases many other clustering algorithms.

LOW-RANK APPROXIMATION-BASED SPECTRAL CLUSTERING FOR LARGE DATASETS

R a c h a n a J a k k u l a & J y o t h i. P Abstract:-Spectral clustering is a well-known graph-theoretic approach of finding natural groupings in a given dataset. Nowadays, digital data are accumulated at a faster than ever speed in various fields, such as the Web, science, engineering, biomedicine, and real-world sensing. It is not uncommon for a dataset to contain tens of thousands of samples and/or features. Spectral clustering generally becomes infeasible for analysing these big data. spectral clustering has become one of the most popular modern clustering algorithms. It is simple to implement, can be solved efficiently by standard linear algebra software, and very often outperforms traditional clustering algorithms such as the k-means algorithm. A particular class of graph clustering algorithms is known as spectral clustering algorithms. These algorithms are mostly based on the Eigen-decomposition of Laplacian matrices of either weighted or unweighted graphs. This survey presents different graph clustering formulations, most of which based on graph cut and partitioning problems, and describes the main spectral clustering algorithms found in literature that solve these problems.In this paper, we propose a Low-rank Approximation-based Spectral (LAS) clustering for big data analytics. By integrating low-rank matrix approximations, i.e., the approximations to the affinity matrix and its subspace, as well as those for the Laplacian matrix and the Laplacian subspace, LAS gains great computational and spatial efficiency for processing big data. In addition, we propose various fast sampling strategies to efficiently select data samples. From a theoretical perspective, we mathematically prove the correctness of LAS, and provide the analysis of its approximation error, and computational complexity.

Spectral Embedded Clustering: A Framework for In-Sample and Out-of-Sample Spectral Clustering

IEEE Transactions on Neural Networks, 2011

Spectral clustering (SC) methods have been successfully applied to many real-world applications. The success of these SC methods is largely based on the manifold assumption, namely, that two nearby data points in the high-density region of a low-dimensional data manifold have the same cluster label. However, such an assumption might not always hold on high-dimensional data. When the data do not exhibit a clear low-dimensional manifold structure (e.g., high-dimensional and sparse data), the clustering performance of SC will be degraded and become even worse than K -means clustering. In this paper, motivated by the observation that the true cluster assignment matrix for high-dimensional data can be always embedded in a linear space spanned by the data, we propose the spectral embedded clustering (SEC) framework, in which a linearity regularization is explicitly added into the objective function of SC methods. More importantly, the proposed SEC framework can naturally deal with out-of-sample data. We also present a new Laplacian matrix constructed from a local regression of each pattern and incorporate it into our SEC framework to capture both local and global discriminative information for clustering. Comprehensive experiments on eight real-world high-dimensional datasets demonstrate the effectiveness and advantages of our SEC framework over existing SC methods and K-means-based clustering methods. Our SEC framework significantly outperforms SC using the Nyström algorithm on unseen data.

Fast approximate spectral clustering

Proceedings of the 15th ACM SIGKDD …, 2009

Spectral clustering refers to a flexible class of clustering procedures that can produce high-quality clusterings on small data sets but which has limited applicability to large-scale problems due to its computational complexity of O(n 3 ), with n the number of data points. We extend the range of spectral clustering by developing a general framework for fast approximate spectral clustering in which a distortion-minimizing local transformation is first applied to the data. This framework is based on a theoretical analysis that provides a statistical characterization of the effect of local distortion on the mis-clustering rate. We develop two concrete instances of our general framework, one based on local k-means clustering (KASP) and one based on random projection trees (RASP). Extensive experiments show that these algorithms can achieve significant speedups with little degradation in clustering accuracy. Specifically, our algorithms outperform k-means by a large margin in terms of accuracy, and run several times faster than approximate spectral clustering based on the Nyström method, with comparable accuracy and significantly smaller memory footprint. Remarkably, our algorithms make it possible for a single machine to spectral cluster data sets with a million observations within several minutes.

Spectral Graph Clustering

2007

Spectral clustering is a powerful technique in data analysis that has found increasing support and application in many areas. This report is geared to give an introduction to its methods, presenting the most common algorithms, discussing advantages and disadvantages of each, rather than endorsing one of them as the best, because, arguably, there is no black-box algorithm, which performs equally

Automatic Graph Building Approach for Spectral Clustering

Lecture Notes in Computer Science, 2013

Spectral clustering techniques have shown their capability to identify the data relationships using graph analysis, achieving better accuracy than traditional algorithms as k-means. Here, we propose a methodology to build automatically a graph representation over the input data for spectral clustering based approaches by taking into account the local and global sample structure. Regarding this, both the Euclidean and the geodesic distances are used to identify the main relationships between a given point and neighboring samples around it. Then, given the information about the local data structure, we estimate an affinity matrix by means of Gaussian kernel. Synthetic and real-world datasets are tested. Attained results show how our approach outperforms, in most of the cases, benchmark methods.