cophenet — SciPy v1.16.0 Manual (original) (raw)
scipy.cluster.hierarchy.
scipy.cluster.hierarchy.cophenet(Z, Y=None)[source]#
Calculate the cophenetic distances between each observation in the hierarchical clustering defined by the linkage Z
.
Suppose p
and q
are original observations in disjoint clusters s
and t
, respectively ands
and t
are joined by a direct parent clusteru
. The cophenetic distance between observationsi
and j
is simply the distance between clusters s
and t
.
Parameters:
Zndarray
The hierarchical clustering encoded as an array (see linkage function).
Yndarray (optional)
Calculates the cophenetic correlation coefficient c
of a hierarchical clustering defined by the linkage matrix Z_of a set of \(n\) observations in \(m\)dimensions. Y is the condensed distance matrix from which_Z was generated.
Returns:
cndarray
The cophentic correlation distance (if Y
is passed).
dndarray
The cophenetic distance matrix in condensed form. The\(ij\) th entry is the cophenetic distance between original observations \(i\) and \(j\).
Notes
cophenet has experimental support for Python Array API Standard compatible backends in addition to NumPy. Please consider testing these features by setting an environment variable SCIPY_ARRAY_API=1
and providing CuPy, PyTorch, JAX, or Dask arrays as array arguments. The following combinations of backend and device (or other capability) are supported.
See Support for the array API standard for more information.
Examples
from scipy.cluster.hierarchy import single, cophenet from scipy.spatial.distance import pdist, squareform
Given a dataset X
and a linkage matrix Z
, the cophenetic distance between two points of X
is the distance between the largest two distinct clusters that each of the points:
X = [[0, 0], [0, 1], [1, 0], ... [0, 4], [0, 3], [1, 4], ... [4, 0], [3, 0], [4, 1], ... [4, 4], [3, 4], [4, 3]]
X
corresponds to this dataset
Z = single(pdist(X)) Z array([[ 0., 1., 1., 2.], [ 2., 12., 1., 3.], [ 3., 4., 1., 2.], [ 5., 14., 1., 3.], [ 6., 7., 1., 2.], [ 8., 16., 1., 3.], [ 9., 10., 1., 2.], [11., 18., 1., 3.], [13., 15., 2., 6.], [17., 20., 2., 9.], [19., 21., 2., 12.]]) cophenet(Z) array([1., 1., 2., 2., 2., 2., 2., 2., 2., 2., 2., 1., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 1., 1., 2., 2., 2., 2., 2., 2., 1., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 1., 1., 2., 2., 2., 1., 2., 2., 2., 2., 2., 2., 1., 1., 1.])
The output of the scipy.cluster.hierarchy.cophenet method is represented in condensed form. We can usescipy.spatial.distance.squareform to see the output as a regular matrix (where each element ij
denotes the cophenetic distance between each i
, j
pair of points in X
):
squareform(cophenet(Z)) array([[0., 1., 1., 2., 2., 2., 2., 2., 2., 2., 2., 2.], [1., 0., 1., 2., 2., 2., 2., 2., 2., 2., 2., 2.], [1., 1., 0., 2., 2., 2., 2., 2., 2., 2., 2., 2.], [2., 2., 2., 0., 1., 1., 2., 2., 2., 2., 2., 2.], [2., 2., 2., 1., 0., 1., 2., 2., 2., 2., 2., 2.], [2., 2., 2., 1., 1., 0., 2., 2., 2., 2., 2., 2.], [2., 2., 2., 2., 2., 2., 0., 1., 1., 2., 2., 2.], [2., 2., 2., 2., 2., 2., 1., 0., 1., 2., 2., 2.], [2., 2., 2., 2., 2., 2., 1., 1., 0., 2., 2., 2.], [2., 2., 2., 2., 2., 2., 2., 2., 2., 0., 1., 1.], [2., 2., 2., 2., 2., 2., 2., 2., 2., 1., 0., 1.], [2., 2., 2., 2., 2., 2., 2., 2., 2., 1., 1., 0.]])
In this example, the cophenetic distance between points on X
that are very close (i.e., in the same corner) is 1. For other pairs of points is 2, because the points will be located in clusters at different corners - thus, the distance between these clusters will be larger.