ExpectationMaximization Algorithm ML (original) (raw)

Expectation-Maximization Algorithm - ML

Last Updated : 2 May, 2026

The Expectation-Maximization (EM) algorithm is an iterative optimization technique used to estimate unknown parameters in probabilistic models, particularly when the data is incomplete, noisy or contains hidden (latent) variables. It works in two steps:

update

Expectation and Maximization in EM Algorithm

These two steps are repeated until convergence, which typically means that:

By iteratively repeating these steps the EM algorithm seeks to maximize the likelihood of the observed data.

Key Concepts

Lets understand about some of the most commonly used key terms in the Expectation-Maximization (EM) Algorithm:

Working

Here's a step-by-step breakdown of the process:

expectation_step

EM Algorithm Flowchart

**1. Initialization: The algorithm starts with initial parameter values and assumes the observed data comes from a specific model.

**2. E-Step (Expectation Step):

**3. M-Step (Maximization Step):

**4. Convergence:

Implementation of Expectation-Maximization Algorithm

Step 1 : Import the necessary libraries

First we will import the necessary Python libraries like NumPy, Seaborn, Matplotlib and SciPy.

Python `

import numpy as np import seaborn as sns import matplotlib.pyplot as plt from scipy.stats import norm, gaussian_kde

`

Step 2 : Generate a dataset with two Gaussian components

We generate two sets of data values from two different normal distributions:

These two sets are then combined to form a single dataset. We plot this dataset to visualize how the values are distributed.

Python `

mu1, sigma1 = 2, 1 mu2, sigma2 = -1, 0.8

X1 = np.random.normal(mu1, sigma1, size=200) X2 = np.random.normal(mu2, sigma2, size=600) X = np.concatenate([X1, X2])

sns.kdeplot(X) plt.xlabel('X') plt.ylabel('Density') plt.title('Density Estimation of X') plt.show()

`

**Output:

density-plot

Density Plot

Step 3: Initialize parameters

We make initial guesses for each group’s:

mu1_hat, sigma1_hat = np.mean(X1), np.std(X1) mu2_hat, sigma2_hat = np.mean(X2), np.std(X2) pi1_hat, pi2_hat = len(X1) / len(X), len(X2) / len(X)

`

Step 4: Perform EM algorithm

We run a loop for 20 rounds called epochs. In each round:

We also calculate the log-likelihood in each round to check if the model is getting better. This is a measure of how well the model explains the data.

Python `

num_epochs = 20 log_likelihoods = []

for epoch in range(num_epochs): gamma1 = pi1_hat * norm.pdf(X, mu1_hat, sigma1_hat) gamma2 = pi2_hat * norm.pdf(X, mu2_hat, sigma2_hat) total = gamma1 + gamma2 gamma1 /= total gamma2 /= total

mu1_hat = np.sum(gamma1 * X) / np.sum(gamma1)
mu2_hat = np.sum(gamma2 * X) / np.sum(gamma2)
sigma1_hat = np.sqrt(np.sum(gamma1 * (X - mu1_hat)**2) / np.sum(gamma1))
sigma2_hat = np.sqrt(np.sum(gamma2 * (X - mu2_hat)**2) / np.sum(gamma2))
pi1_hat = np.mean(gamma1)
pi2_hat = np.mean(gamma2)

log_likelihood = np.sum(np.log(pi1_hat * norm.pdf(X, mu1_hat, sigma1_hat)
                               + pi2_hat * norm.pdf(X, mu2_hat, sigma2_hat)))
log_likelihoods.append(log_likelihood)

plt.plot(range(1, num_epochs + 1), log_likelihoods) plt.xlabel('Epoch') plt.ylabel('Log-Likelihood') plt.title('Log-Likelihood vs. Epoch') plt.show()

`

**Output:

epoch-vs-likelihood

Epoch vs. Log-Likelihood Plot

Step 5: Visualize the Final Result

Now we will finally visualize the curve which compare the final estimated curve (in red) with the original data’s smooth curve (in green).

Python `

X_sorted = np.sort(X) density_estimation = (pi1_hat * norm.pdf(X_sorted, mu1_hat, sigma1_hat) + pi2_hat * norm.pdf(X_sorted, mu2_hat, sigma2_hat))

plt.plot(X_sorted, gaussian_kde(X_sorted)( X_sorted), color='green', linewidth=2) plt.plot(X_sorted, density_estimation, color='red', linewidth=2) plt.xlabel('X') plt.ylabel('Density') plt.title('Final Density Estimation') plt.legend(['Kernel Density Estimation', 'Mixture Density']) plt.show()

`

**Output:

density

Estimated Density

The above image compares Kernel Density Estimation (green) and Mixture Density (red) for variable X. Both show similar patterns with a main peak near -1.5 and a smaller bump around 2 indicate two data clusters. The red curve is slightly smoother and sharper than the green one.

Applications

Advantages

Limitations