cophenet — SciPy v1.15.2 Manual (original) (raw)

scipy.cluster.hierarchy.

scipy.cluster.hierarchy.cophenet(Z, Y=None)[source]#

Calculate the cophenetic distances between each observation in the hierarchical clustering defined by the linkage Z.

Suppose p and q are original observations in disjoint clusters s and t, respectively ands and t are joined by a direct parent clusteru. The cophenetic distance between observationsi and j is simply the distance between clusters s and t.

Parameters:

Zndarray

The hierarchical clustering encoded as an array (see linkage function).

Yndarray (optional)

Calculates the cophenetic correlation coefficient c of a hierarchical clustering defined by the linkage matrix Z_of a set of \(n\) observations in \(m\)dimensions. Y is the condensed distance matrix from which_Z was generated.

Returns:

cndarray

The cophentic correlation distance (if Y is passed).

dndarray

The cophenetic distance matrix in condensed form. The\(ij\) th entry is the cophenetic distance between original observations \(i\) and \(j\).

Examples

from scipy.cluster.hierarchy import single, cophenet from scipy.spatial.distance import pdist, squareform

Given a dataset X and a linkage matrix Z, the cophenetic distance between two points of X is the distance between the largest two distinct clusters that each of the points:

X = [[0, 0], [0, 1], [1, 0], ... [0, 4], [0, 3], [1, 4], ... [4, 0], [3, 0], [4, 1], ... [4, 4], [3, 4], [4, 3]]

X corresponds to this dataset

Z = single(pdist(X)) Z array([[ 0., 1., 1., 2.], [ 2., 12., 1., 3.], [ 3., 4., 1., 2.], [ 5., 14., 1., 3.], [ 6., 7., 1., 2.], [ 8., 16., 1., 3.], [ 9., 10., 1., 2.], [11., 18., 1., 3.], [13., 15., 2., 6.], [17., 20., 2., 9.], [19., 21., 2., 12.]]) cophenet(Z) array([1., 1., 2., 2., 2., 2., 2., 2., 2., 2., 2., 1., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 1., 1., 2., 2., 2., 2., 2., 2., 1., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 1., 1., 2., 2., 2., 1., 2., 2., 2., 2., 2., 2., 1., 1., 1.])

The output of the scipy.cluster.hierarchy.cophenet method is represented in condensed form. We can usescipy.spatial.distance.squareform to see the output as a regular matrix (where each element ij denotes the cophenetic distance between each i, j pair of points in X):

squareform(cophenet(Z)) array([[0., 1., 1., 2., 2., 2., 2., 2., 2., 2., 2., 2.], [1., 0., 1., 2., 2., 2., 2., 2., 2., 2., 2., 2.], [1., 1., 0., 2., 2., 2., 2., 2., 2., 2., 2., 2.], [2., 2., 2., 0., 1., 1., 2., 2., 2., 2., 2., 2.], [2., 2., 2., 1., 0., 1., 2., 2., 2., 2., 2., 2.], [2., 2., 2., 1., 1., 0., 2., 2., 2., 2., 2., 2.], [2., 2., 2., 2., 2., 2., 0., 1., 1., 2., 2., 2.], [2., 2., 2., 2., 2., 2., 1., 0., 1., 2., 2., 2.], [2., 2., 2., 2., 2., 2., 1., 1., 0., 2., 2., 2.], [2., 2., 2., 2., 2., 2., 2., 2., 2., 0., 1., 1.], [2., 2., 2., 2., 2., 2., 2., 2., 2., 1., 0., 1.], [2., 2., 2., 2., 2., 2., 2., 2., 2., 1., 1., 0.]])

In this example, the cophenetic distance between points on X that are very close (i.e., in the same corner) is 1. For other pairs of points is 2, because the points will be located in clusters at different corners - thus, the distance between these clusters will be larger.