Measures of Distance (original) (raw)

Last Updated : 8 Sep, 2025

Measures of distance are mathematical functions used to quantify how similar or dissimilar two objects are based on their features. These measures are critical for clustering, classification and information retrieval because they help determine relationships among data points. The choice of distance depends on the nature of the data and the application domain.

Let's see few types of distances.

1. Euclidean Distance

409842921

Euclidean Distance

Euclidean distance is considered the traditional metric for problems with geometry. It can be simply explained as the ordinary distance between two points. It is one of the most used algorithms in the cluster analysis.

**Formula:

d(x,y) = \sqrt{\sum_{i=1}^n (x_i - y_i)^2}

2. Manhattan Distance

Manhattan Distance determines the absolute difference among the pair of the coordinates. Suppose we have two points P and Q to determine the distance between these points we simply have to calculate the perpendicular distance of the points from X-Axis and Y-Axis. In a plane with P at coordinate (x1, y1) and Q at (x2, y2). Manhattan distance between P and Q = |x1 – x2| + |y1 – y2|

**Formula:

d(x,y) = \sum_{i=1}^n |x_i - y_i|

3. Jaccard Index

jaccard_coefficient

Jaccard Index

The Jaccard distance is set-based distance that compares dissimilarity by looking at the ratio of unique to common elements.

**Formula:

d(A,B) = 1 - \frac{|A \cap B|}{|A \cup B|}

4. Minkowski distance

Minkowski distance is a generalized distance measure that includes both Euclidean and Manhattan distances as special cases, controlled by a parameter p.

**Formula:

d(x,y) = \left( \sum_{i=1}^n |x_i - y_i|^p \right)^{\frac{1}{p}}

5. Cosine Similarity / Cosine Distance

Measures the cosine distance of the angle between two vectors, focusing on orientation rather than magnitude. Commonly converted to distance as 1−similarity.

**Formula(Similarity):

\text{Cosine}(x,y) = \frac{x \cdot y}{||x|| \, ||y||}

**Formula(Distance):

d(x,y) = 1 - \frac{x \cdot y}{||x|| \, ||y||}

6. Hamming Distance

The number of positions where two strings (of equal length) differ. Commonly used for error detection and sequence comparison.

**Formula:

d(x,y) = \sum_{i=1}^n [x_i \neq y_i]

where [ [x_i \neq y_i] = 1 if symbols differ, else 0.