An example of K-Means++ initialization (original) (raw)

Note

Go to the endto download the full example code. or to run this example in your browser via JupyterLite or Binder

An example to show the output of the sklearn.cluster.kmeans_plusplusfunction for generating initial seeds for clustering.

K-Means++ is used as the default initialization for K-means.

K-Means++ Initialization

Authors: The scikit-learn developers

SPDX-License-Identifier: BSD-3-Clause

import matplotlib.pyplot as plt

from sklearn.cluster import kmeans_plusplus from sklearn.datasets import make_blobs

Generate sample data

n_samples = 4000 n_components = 4

X, y_true = make_blobs( n_samples=n_samples, centers=n_components, cluster_std=0.60, random_state=0 ) X = X[:, ::-1]

Calculate seeds from k-means++

centers_init, indices = kmeans_plusplus(X, n_clusters=4, random_state=0)

Plot init seeds along side sample data

plt.figure(1) colors = ["#4EACC5", "#FF9C34", "#4E9A06", "m"]

for k, col in enumerate(colors): cluster_data = y_true == k plt.scatter(X[cluster_data, 0], X[cluster_data, 1], c=col, marker=".", s=10)

plt.scatter(centers_init[:, 0], centers_init[:, 1], c="b", s=50) plt.title("K-Means++ Initialization") plt.xticks([]) plt.yticks([]) plt.show()

Total running time of the script: (0 minutes 0.061 seconds)

Related examples

Gallery generated by Sphinx-Gallery