Types of Linkages in Hierarchical Clustering (original) (raw)
Last Updated : 12 Jul, 2025
**Hierarchical clustering isused to group similar data points and organise data in a tree-like structure. Key part of this process is linkage which **calculates the distance between clusters before they are merged or divided. Different types of linkage isused measure this distance differently. In this article, we’ll look at different linkage methods and see how they affect the cluster formation.
**1. Single Linkage
For two clusters **R and **S the **single linkage returns the minimum distance between two points. This method creates long, chain-like clusters because it is **sensitive to outliers and can **connect clusters based on a very small number of close points.
L(R, S) = min(D(i, j)), i\epsilon R, j\epsilon S
where
- **D(i, j): Distance function between points **i and **j.

Single Linkage
2. **Complete Linkage
For two clusters **R and **S the **complete linkage returns the maximum distance between two points. It tends to create compact and spherical clusters because it is **more sensitive to outliers and tries to make sure that the clusters are not too far.
L(R, S) = \max(D(i, j)), \, i \in R, \, j \in S
where
- **D(i, j): Distance function between points **i and **j.

Complete Linkage
**3. Average Linkage
It returns the **average distance between all pairs of points from two clusters. This method maintain a **balance between single and complete linkage by considering all pairs of points not just the closest or farthest point. It usually **results in clusters that are moderately compact.
L(R,S) = \frac{1}{n_{R}\times n_{S}}\sum_{i=1}^{n_{R}}\sum_{j=1}^{n_{S}} D(i,j), i\in R, j\in S
where
- n_{R} : Number of data-points in R
- n_{S} : Number of data-points in S

Average Linkage
4. Ward's Linkage
It calculates the distance between two clusters by looking at total spread or variance increase when the clusters are combined. This method **creates compact, well-separated clusters by making sure that data within each cluster is as similar as possible.
L(R, S) = \frac{n_R + n_S}{n_R \times n_S} \sum_{i=1}^{n_R} \sum_{j=1}^{n_S} D(i,j), \quad i \in R, j \in S
where
- n_R and n_S are the sizes of clusters R and S
- D(i, j) is the distance between points i \in R and j \in S.

Ward Linkage
5. Centroid Linkage
Itcalculates the distance between two clusters based on the **distance between their central points i.e the average of all points in the cluster. This method works well when clusters are round or evenly shaped but it **may not be the best for irregularly shaped clusters.
L(R, S) = D(\bar{R}, \bar{S})
where
- \bar{R} and \bar{S} are the centroids (mean points) of clusters R and S
- D(\bar{R}, \bar{S}) is the distance between the centroids of clusters R and S.

Centroid Linkage
Each linkage method has its own advantages and we can use them based on our needs and type of data we have.
**Read More: