Cluster Analysis (original) (raw)

Last Updated : 8 Apr, 2026

Cluster analysis (clustering) groups similar data points so that items within the same cluster are more alike than those in different clusters. It is widely used in e-commerce for customer segmentation to enable personalized recommendations and improved user experiences.

**Cluster analysis is useful for:

cluster_analysis

Cluster Analysis

Distance Metrics in Cluster Analysis

Distance metrics are simple mathematical formulas to figure out how similar or different two data points are. Type of distance metrics we choose plays a big role in deciding clustering results. Some of the common metrics are:

Types of Clustering Techniques

Clustering can be broadly classified into several methods. The choice of method depends on the type of data and the problem you're solving.

1. Partitioning Methods

2. Hierarchical Methods

Hierarchical clustering builds a tree-like structure of clusters known as a dendrogram that represents the merging or splitting of clusters. It can be divided into:

3. Density-Based Methods

4. Grid-Based Methods

5. Model-Based Methods

6. Constraint-Based Methods

Impact of Data on Clustering Techniques

Clustering methods must be chosen and adapted based on the type of data being analyzed as different data types require different similarity measures and algorithms.

1. Numerical Data

Numerical data consists of measurable values such as age, income or temperature. Distance-based algorithms like k-means, DBSCAN and hierarchical clustering work well because they rely on numerical distance calculations.

For example a fitness app can cluster users using average daily steps and heart rate to identify fitness levels.

2. Categorical Data

Categorical data includes non-numerical attributes such as gender, product categories or survey responses. Algorithms like k-modes or hierarchical clustering with appropriate similarity measures are more suitable.

For example customers can be grouped based on preferred shopping categories such as electronics, fashion or home appliances.

3. Mixed Data

Mixed data contains both numerical and categorical features which require specialized or hybrid approaches. Algorithms such as k-prototypes or distance measures like Gower distance are commonly used.

For example, clustering customers based on income (numerical) and shopping preferences (categorical) can be effectively handled using the k-prototypes method.

Applications

Limitations