cophenet — SciPy v1.15.2 Manual (original) (raw)
scipy.cluster.hierarchy.
scipy.cluster.hierarchy.cophenet(Z, Y=None)[source]#
Calculate the cophenetic distances between each observation in the hierarchical clustering defined by the linkage Z
.
Suppose p
and q
are original observations in disjoint clusters s
and t
, respectively ands
and t
are joined by a direct parent clusteru
. The cophenetic distance between observationsi
and j
is simply the distance between clusters s
and t
.
Parameters:
Zndarray
The hierarchical clustering encoded as an array (see linkage function).
Yndarray (optional)
Calculates the cophenetic correlation coefficient c
of a hierarchical clustering defined by the linkage matrix Z_of a set of \(n\) observations in \(m\)dimensions. Y is the condensed distance matrix from which_Z was generated.
Returns:
cndarray
The cophentic correlation distance (if Y
is passed).
dndarray
The cophenetic distance matrix in condensed form. The\(ij\) th entry is the cophenetic distance between original observations \(i\) and \(j\).
Examples
from scipy.cluster.hierarchy import single, cophenet from scipy.spatial.distance import pdist, squareform
Given a dataset X
and a linkage matrix Z
, the cophenetic distance between two points of X
is the distance between the largest two distinct clusters that each of the points:
X = [[0, 0], [0, 1], [1, 0], ... [0, 4], [0, 3], [1, 4], ... [4, 0], [3, 0], [4, 1], ... [4, 4], [3, 4], [4, 3]]
X
corresponds to this dataset
Z = single(pdist(X)) Z array([[ 0., 1., 1., 2.], [ 2., 12., 1., 3.], [ 3., 4., 1., 2.], [ 5., 14., 1., 3.], [ 6., 7., 1., 2.], [ 8., 16., 1., 3.], [ 9., 10., 1., 2.], [11., 18., 1., 3.], [13., 15., 2., 6.], [17., 20., 2., 9.], [19., 21., 2., 12.]]) cophenet(Z) array([1., 1., 2., 2., 2., 2., 2., 2., 2., 2., 2., 1., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 1., 1., 2., 2., 2., 2., 2., 2., 1., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 1., 1., 2., 2., 2., 1., 2., 2., 2., 2., 2., 2., 1., 1., 1.])
The output of the scipy.cluster.hierarchy.cophenet method is represented in condensed form. We can usescipy.spatial.distance.squareform to see the output as a regular matrix (where each element ij
denotes the cophenetic distance between each i
, j
pair of points in X
):
squareform(cophenet(Z)) array([[0., 1., 1., 2., 2., 2., 2., 2., 2., 2., 2., 2.], [1., 0., 1., 2., 2., 2., 2., 2., 2., 2., 2., 2.], [1., 1., 0., 2., 2., 2., 2., 2., 2., 2., 2., 2.], [2., 2., 2., 0., 1., 1., 2., 2., 2., 2., 2., 2.], [2., 2., 2., 1., 0., 1., 2., 2., 2., 2., 2., 2.], [2., 2., 2., 1., 1., 0., 2., 2., 2., 2., 2., 2.], [2., 2., 2., 2., 2., 2., 0., 1., 1., 2., 2., 2.], [2., 2., 2., 2., 2., 2., 1., 0., 1., 2., 2., 2.], [2., 2., 2., 2., 2., 2., 1., 1., 0., 2., 2., 2.], [2., 2., 2., 2., 2., 2., 2., 2., 2., 0., 1., 1.], [2., 2., 2., 2., 2., 2., 2., 2., 2., 1., 0., 1.], [2., 2., 2., 2., 2., 2., 2., 2., 2., 1., 1., 0.]])
In this example, the cophenetic distance between points on X
that are very close (i.e., in the same corner) is 1. For other pairs of points is 2, because the points will be located in clusters at different corners - thus, the distance between these clusters will be larger.