Spectral Clustering (original) (raw)

Last Updated : 3 Apr, 2026

Spectral Clustering is an unsupervised learning technique used to group data points based on their similarity. It uses concepts from graph theory and linear algebra to transform data into a space where clustering becomes easier.

Types of Spectral Clustering

Key Concepts

It explains how data is represented and transformed so that clusters can be identified more easily. The main concepts involved are:

1. Similarity Matrix (W)

The matrix which represents how similar two data points are. It measures the relationship between every pair of points based on distance.

W_{ij} = e^{-\frac{||x_i - x_j||^2}{2\sigma^2}}

2. Degree Matrix (D)

Degree matrix represents the total connection strength of each data point. It is formed by summing all similarity values for a given point.

D_{ii} = \sum_{j} W_{ij}

3. Graph Laplacian (L)

This matrix identifies the structure of the graph. It is obtained by subtracting the similarity matrix from the degree matrix.

L = D - W

4. Eigenvalues & Eigenvectors

These are mathematical tools used to extract important patterns from the data. They help transform the data into a new space where clusters become clearer.

Working

The steps involved in Spectral Clustering are as follows:

spectral

Working of Spectral Clustering

**Step 1: Converting data into a graph structure where

**Step 2: Now,Instead of looking at raw data, we study this network structure.

**Step 3: The Laplacian matrix combines similarity and connection information to reveal the structure of the data.

**Step 4: Eigenvectors are then used to rearrange the data into a new form.

**Step 5: Here, data is transformed into a new lower-dimensional space where points from same cluster move closer

**Step 6: We apply K - Means algorithm where clusters become easy to detect.

Implementation

Step 1: Importing Libraries

Importing libraries like

import numpy as np import matplotlib.pyplot as plt from sklearn.datasets import make_moons from sklearn.cluster import SpectralClustering

`

Step 2: Creating a Dataset

In this step, we create a sample non-linear dataset to apply Spectral Clustering.

Python `

X, _ = make_moons(n_samples=300, noise=0.05)

`

Step 3: Visualizing the Original Dataset

Now, we plot the dataset to understand its distribution. It helps understand actual structure of data.

Python `

plt.scatter(X[:, 0], X[:, 1])
plt.title("Original Data")
plt.show()

`

Screenshot-from-2026-03-26-11-40-36

Raw Dataset before Clustering

Step 4: Building a Spectral Clustering Model

Here, we build the Spectral Clustering model by setting parameters such as the number of clusters and the method used to compute similarity between data points.

model = SpectralClustering( n_clusters=2, affinity='nearest_neighbors', n_neighbors=15 )

`

Step 5: Training the model

This step involves fitting the model to the data and grouping points into clusters. This is done using fit_predict() which fits the model and returns the cluster labels.

Python `

labels = model.fit_predict(X)

`

Step 6: Visualizing the Clustered Output

The clustered data is visualized to show the final grouping of data points. Different colors are used to represent different clusters****.**

Python `

plt.scatter(X[:, 0], X[:, 1], c=labels)
plt.title("Spectral Clustering")
plt.show()

`

Screenshot-from-2026-03-26-14-25-09

Spectral Clustering Result

Step 7: Applying K-Means algorithm for comparison

Here, we apply K-Means algorithm to the same dataset for comparison. This helps in understanding how Spectral Clustering performs better than K-Means algorithm on non-linear data.

Python `

from sklearn.cluster import KMeans

kmeans = KMeans(n_clusters=2, random_state=42) k_labels = kmeans.fit_predict(X)

plt.figure(figsize=(10,4))

plt.subplot(1,2,1) plt.scatter(X[:, 0], X[:, 1], c=labels) plt.title("Spectral Clustering")

plt.subplot(1,2,2) plt.scatter(X[:, 0], X[:, 1], c=k_labels) plt.title("K-Means Clustering")

plt.show()

`

Screenshot-from-2026-03-26-14-25-56

Spectral Clustering vs K-Means Clustering

The complete source code can be accessed here.

Applications

Limitations