A Convex Formulation for Spectral Shrunk Clustering (original) (raw)

Dimensionality Reduction for Spectral Clustering

2011

Spectral clustering is a flexible clustering methodology that is applicable to a variety of data types and has the particular virtue that it makes few assumptions on cluster shapes. It has become popular in a variety of application areas, particularly in computational vision and bioinformatics. The approach appears, however, to be particularly sensitive to irrelevant and noisy dimensions in the data. We thus introduce an approach that automatically learns the relevant dimensions and spectral clustering simultaneously. We pursue an augmented form of spectral clustering in which an explicit projection operator is incorporated in the relaxed optimization functional. We optimize this functional over both the projection and the spectral embedding. Experiments on simulated and real data show that this approach yields significant improvements in the performance of spectral clustering.

Grassmannian Manifold Optimization Assisted Sparse Spectral Clustering

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

Spectral Clustering is one of pioneered clustering methods. It relies on the spectral decomposition criterion to learn a low-dimensional embedding of data for a basic clustering algorithm. The sparse spectral clustering (SSC) introduces the sparsity for the similarity in low-dimensional space by enforcing a sparsity-induced penalty, resulting a non-convex optimization, which is solved by a relaxed convex problem via the standard ADMM (Alternative Direction Method of Multipliers), rather than inferring latent representation from eigen-structure. This paper provides a direct solution as solving a new Grassmann optimization problem. By this way calculating latent embedding becomes part of optmization on manifolds and the recently developed manifold optimization methods can be applied. It turns out the learned new features are not only very informative for clustering, but also more intuitive and effective in visualization after dimensionality reduction. We conduct empirical studies on simulated datasets and several real-world benchmark datasets to validate the proposed methods. Experimental results exhibit the effectiveness of this new manifold-based clustering and dimensionality reduction method.

Weighted semi-supervised manifold clustering via sparse representation

2016 6th International Conference on Computer and Knowledge Engineering (ICCKE), 2016

over the last few years, manifold clustering has attracted considerable interest in high-dimensional data clustering. However achieving accurate clustering results that match user desires and data structure is still an open problem. One way to do so is incorporating additional information that indicate relation between data objects. In this paper we propose a method for constrained clustering that take advantage of pairwise constraints. It first solves an optimization program to construct an affinity matrix according to pairwise constraints and manifold structure of data, then applies spectral clustering to find data clusters. Experiments demonstrated that our algorithm outperforms other related algorithms in face image datasets and has comparable results on handwritten digit datasets.

Spectral Embedded Clustering: A Framework for In-Sample and Out-of-Sample Spectral Clustering

IEEE Transactions on Neural Networks, 2011

Spectral clustering (SC) methods have been successfully applied to many real-world applications. The success of these SC methods is largely based on the manifold assumption, namely, that two nearby data points in the high-density region of a low-dimensional data manifold have the same cluster label. However, such an assumption might not always hold on high-dimensional data. When the data do not exhibit a clear low-dimensional manifold structure (e.g., high-dimensional and sparse data), the clustering performance of SC will be degraded and become even worse than K -means clustering. In this paper, motivated by the observation that the true cluster assignment matrix for high-dimensional data can be always embedded in a linear space spanned by the data, we propose the spectral embedded clustering (SEC) framework, in which a linearity regularization is explicitly added into the objective function of SC methods. More importantly, the proposed SEC framework can naturally deal with out-of-sample data. We also present a new Laplacian matrix constructed from a local regression of each pattern and incorporate it into our SEC framework to capture both local and global discriminative information for clustering. Comprehensive experiments on eight real-world high-dimensional datasets demonstrate the effectiveness and advantages of our SEC framework over existing SC methods and K-means-based clustering methods. Our SEC framework significantly outperforms SC using the Nyström algorithm on unseen data.

Spectral Clustering

Studies in big data, 2017

This chapter discusses clustering methods based on similarities between pairs of objects. Such a knowledge does not imply that the entire objects are embedded in a metric space. Instead, the local knowledge supports a graphical representation displaying relationships among the objects from a given data set. The problem of data clustering transforms then into the problem of graph partitioning, and this partitioning is acquired by analysing eigenvectors of the graph Laplacian, a basic tool used in spectral graph theory. We explain how various forms of graph Laplacian are used in various graph partitioning criteria, and how these translate into particular algorithms. There is a strong and fascinating relationship between graph Laplacian and random walk on a graph. Particularly, it allows to formulate a number of other clustering criteria, and to formulate another data clustering algorithms. We briefly review these problems. It should be noted that the eigenvectors deliver so-called spectral representation of data items. Unfortunately, this representation is fixed for a given data set, and adding or deleting some items destroys it. Thus we discuss recently invented methods of out of sample spectral clustering allowing to overcome this disadvantage. Although spectral methods are successful in extracting non-convex groups in data, the process of forming graph Laplacian is memory consuming and computing its eigenvectors is time consuming. Thus we discuss various local methods in which only relevant part of the graph are considered. Moreover, we mention a number of methods allowing fast and approximate computation of the eigenvectors.

Spectral clustering based on learning similarity matrix

Bioinformatics (Oxford, England), 2018

Single-cell RNA-sequencing (scRNA-seq) technology can generate genome-wide expression data at the single-cell levels. One important objective in scRNA-seq analysis is to cluster cells where each cluster consists of cells belonging to the same cell type based on gene expression patterns. We introduce a novel spectral clustering framework that imposes sparse structures on a target matrix. Specifically, we utilize multiple doubly stochastic similarity matrices to learn a similarity matrix, motivated by the observation that each similarity matrix can be a different informative representation of the data. We impose a sparse structure on the target matrix followed by shrinking pairwise differences of the rows in the target matrix, motivated by the fact that the target matrix should have these structures in the ideal case. We solve the proposed non-convex problem iteratively using the ADMM algorithm and show the convergence of the algorithm. We evaluate the performance of the proposed clus...

Transformed Locally Linear Manifold Clustering

2018 26th European Signal Processing Conference (EUSIPCO), 2018

Transform learning is a relatively new analysis formulation for learning a basis to represent signals. This work incorporates the simplest subspace clustering formulation-Locally Linear Manifold Clustering, into the transform learning formulation. The core idea is to perform the clustering task in a transformed domain instead of processing directly the raw samples. The transform analysis step and the clustering are not done piecemeal but are performed jointly through the formulation of a coupled minimization problem. Comparison with state-of-the-art deep learning-based clustering methods and popular subspace clustering techniques shows that our formulation improves upon them.

Scalable semi-supervised clustering by spectral kernel learning

Pattern Recognition Letters, 2014

Kernel learning is one of the most important and recent approaches to constrained clustering. Until now many kernel learning methods have been introduced for clustering when side information in the form of pairwise constraints is available. However, almost all of the existing methods either learn a whole kernel matrix or learn a limited number of parameters. Although the non-parametric methods that learn whole kernel matrix can provide capability of finding clusters of arbitrary structures, they are very computationally expensive and these methods are feasible only on small data sets. In this paper, we propose a kernel learning method that shows flexibility in the number of variables between the two extremes of freedom degree. The proposed method uses a spectral embedding to learn a square matrix whose number of rows is the number of dimensions in the embedded space. Therefore, the proposed method shows much higher scalability compared to other methods that learn a kernel matrix. Experimental results on synthetic and real-world data sets show that the performance of the proposed method is generally near to the learning a whole kernel matrix while its time cost is very low compared to these methods.

Semidefinite spectral clustering

Pattern Recognition, 2006

Multi-way partitioning of an undirected weighted graph where pairwise similarities are assigned as edge weights, provides an important tool for data clustering, but is an NP-hard problem. Spectral relaxation is a popular way of relaxation, leading to spectral clustering where the clustering is performed by the eigen-decomposition of the (normalized) graph Laplacian. On the other hand, semidefinite relaxation, is an alternative way of relaxing a combinatorial optimization, leading to a convex optimization. In this paper we employ a semidefinite programming (SDP) approach to the graph equipartitioning for clustering, where sufficient conditions for strong duality hold. The method is referred to as semidefinite spectral clustering, where the clustering is based on the eigen-decomposition of the optimal feasible matrix computed by SDP. Numerical experiments with several data sets, demonstrate the useful behavior of our semidefinite spectral clustering, compared to existing spectral clustering methods.

Unified Spectral Clustering With Optimal Graph

Proceedings of the AAAI Conference on Artificial Intelligence

Spectral clustering has found extensive use in many areas. Most traditional spectral clustering algorithms work in three separate steps: similarity graph construction; continuous labels learning; discretizing the learned labels by k-means clustering. Such common practice has two potential flaws, which may lead to severe information loss and performance degradation. First, predefined similarity graph might not be optimal for subsequent clustering. It is well-accepted that similarity graph highly affects the clustering results. To this end, we propose to automatically learn similarity information from data and simultaneously consider the constraint that the similarity matrix has exact c connected components if there are c clusters. Second, the discrete solution may deviate from the spectral solution since k-means method is well-known as sensitive to the initialization of cluster centers. In this work, we transform the candidate solution into a new one that better approximates the disc...