Gaussian Mixture Model (original) (raw)

Last Updated : 2 May, 2026

Gaussian Mixture Model (GMM) is a probabilistic clustering technique that models data as a combination of multiple Gaussian distributions, allowing more flexible grouping of data points.

GMM

Visualization of three distinct one-dimensional Gaussian distributions

The above shown graph shows a three one-dimensional Gaussian distributions with distinct means and variances. Each curve represents the theoretical probability density function (PDF) of a normal distribution, highlighting differences in location and spread.

Working of GMM

gaussian_mixture_model_process

Working of Gaussian Mixture Model

A Gaussian Mixture Model assumes that the data is generated from a mixture of K Gaussian distributions, each representing a cluster. Every Gaussian has its own mean \mu_k, covariance \Sigma_k and mixing weight \pi_k.

1. Posterior Probability (Cluster Responsibility)

For a given data point xn​, the probability that it belongs to cluster k:

P(z_n = k \mid x_n) = \frac{\pi_k \cdot \mathcal{N}(x_n \mid \mu_k, \Sigma_k)}{\sum_{j=1}^{K} \pi_j \cdot \mathcal{N}(x_n \mid \mu_j, \Sigma_j)}

**where:

**2. Likelihood of a Data Point

The total likelihood of observing xnx_nxn​ under all Gaussians is:

P(x_n) = \sum_{k=1}^{K} \pi_k \cdot \mathcal{N}(x_n \mid \mu_k, \Sigma_k)

This represents how well the mixture as a whole explains the data point.

3. Expectation-Maximization (EM) Algorithm

GMM parameters are estimated using the EM algorithm:

**E-step (Expectation): Compute the responsibility of each cluster for every data point using current parameter values.

**M-step (Maximization): Update

4. Log-Likelihood of the Mixture Model

The objective optimized by EM is:

L(\mu, \Sigma, \pi) = \prod_{n=1}^{N} \sum_{k=1}^{K} \pi_k \cdot \mathcal{N}(x_n \mid \mu_k, \Sigma_k)

EM increases this likelihood in every iteration.

Cluster Shapes in GMM

In GMM, each cluster is a Gaussian defined by:

Because covariance matrices allow elliptical shapes, GMM can model:

This makes GMM more flexible than methods like K-Means, which assumes only spherical clusters.

**Visualizing GMM often involves:

These illustrate how GMM adapts to complex, real-world data distributions.

Implementing Gaussian Mixture Model (GMM)

Import required libraries. make_blobs creates a simple synthetic dataset for demo.

Python `

import numpy as np import matplotlib.pyplot as plt from sklearn.mixture import GaussianMixture from sklearn.datasets import make_blobs

`

Step 1: Generate synthetic data

creates 500 points in 2D grouped around 3 centers. cluster_std controls how tight or spread each cluster is. y is the true label (only for reference).

Python `

X, y = make_blobs( n_samples=500, centers=3, random_state=42, cluster_std=[1.0, 1.5, 0.8] # spread for each cluster )

`

Step 2: Fit the Gaussian Mixture Model

gmm = GaussianMixture( n_components=3, # number of Gaussian components covariance_type='full', random_state=42 )

gmm.fit(X)
labels = gmm.predict(X)

`

Step 3: Plot clusters and component centers

Points colored by assigned cluster and red X marks showing the learned Gaussian centers.

Python `

plt.figure(figsize=(8, 6))

scatter points colored by hard labels

plt.scatter(X[:, 0], X[:, 1], c=labels, cmap='viridis', s=50, edgecolor='k')

plot Gaussian centers

plt.scatter( gmm.means_[:, 0], gmm.means_[:, 1], s=300, c='red', marker='X', label='Centers' )

plt.title("Gaussian Mixture Model Clustering") plt.xlabel("Feature 1") plt.ylabel("Feature 2") plt.grid(True) plt.legend() plt.show()

`

**Output:

Gaussian-Mixture-Model

Plot clusters and component centers

You can download the complete code from here.

Use-Cases

Advantages

Limitations