Types of Linkages in Hierarchical Clustering (original) (raw)

Last Updated : 12 Jul, 2025

**Hierarchical clustering isused to group similar data points and organise data in a tree-like structure. Key part of this process is linkage which **calculates the distance between clusters before they are merged or divided. Different types of linkage isused measure this distance differently. In this article, we’ll look at different linkage methods and see how they affect the cluster formation.

**1. Single Linkage

For two clusters **R and **S the **single linkage returns the minimum distance between two points. This method creates long, chain-like clusters because it is **sensitive to outliers and can **connect clusters based on a very small number of close points.

L(R, S) = min(D(i, j)), i\epsilon R, j\epsilon S

where

Single-Linkage

Single Linkage

2. **Complete Linkage

For two clusters **R and **S the **complete linkage returns the maximum distance between two points. It tends to create compact and spherical clusters because it is **more sensitive to outliers and tries to make sure that the clusters are not too far.

L(R, S) = \max(D(i, j)), \, i \in R, \, j \in S

where

Complete-Linkage

Complete Linkage

**3. Average Linkage

It returns the **average distance between all pairs of points from two clusters. This method maintain a **balance between single and complete linkage by considering all pairs of points not just the closest or farthest point. It usually **results in clusters that are moderately compact.

L(R,S) = \frac{1}{n_{R}\times n_{S}}\sum_{i=1}^{n_{R}}\sum_{j=1}^{n_{S}} D(i,j), i\in R, j\in S

where

average1

Average Linkage

4. Ward's Linkage

It calculates the distance between two clusters by looking at total spread or variance increase when the clusters are combined. This method **creates compact, well-separated clusters by making sure that data within each cluster is as similar as possible.

L(R, S) = \frac{n_R + n_S}{n_R \times n_S} \sum_{i=1}^{n_R} \sum_{j=1}^{n_S} D(i,j), \quad i \in R, j \in S

where

ward-linkage

Ward Linkage

5. Centroid Linkage

Itcalculates the distance between two clusters based on the **distance between their central points i.e the average of all points in the cluster. This method works well when clusters are round or evenly shaped but it **may not be the best for irregularly shaped clusters.

L(R, S) = D(\bar{R}, \bar{S})

where

centroid-linkage

Centroid Linkage

Each linkage method has its own advantages and we can use them based on our needs and type of data we have.

**Read More: