ward — SciPy v1.16.2 Manual (original) (raw)
scipy.cluster.hierarchy.
scipy.cluster.hierarchy.ward(y)[source]#
Perform Ward’s linkage on a condensed distance matrix.
See linkage for more information on the return structure and algorithm.
The following are common calling conventions:
Z = ward(y)Performs Ward’s linkage on the condensed distance matrixy.Z = ward(X)Performs Ward’s linkage on the observation matrixXusing Euclidean distance as the distance metric.
Parameters:
yndarray
A condensed distance matrix. A condensed distance matrix is a flat array containing the upper triangular of the distance matrix. This is the form thatpdist returns. Alternatively, a collection of m observation vectors in n dimensions may be passed as an m by n array.
Returns:
Zndarray
The hierarchical clustering encoded as a linkage matrix. Seelinkage for more information on the return structure and algorithm.
Notes
ward has experimental support for Python Array API Standard compatible backends in addition to NumPy. Please consider testing these features by setting an environment variable SCIPY_ARRAY_API=1 and providing CuPy, PyTorch, JAX, or Dask arrays as array arguments. The following combinations of backend and device (or other capability) are supported.
See Support for the array API standard for more information.
Examples
from scipy.cluster.hierarchy import ward, fcluster from scipy.spatial.distance import pdist
First, we need a toy dataset to play with:
X = [[0, 0], [0, 1], [1, 0], ... [0, 4], [0, 3], [1, 4], ... [4, 0], [3, 0], [4, 1], ... [4, 4], [3, 4], [4, 3]]
Then, we get a condensed distance matrix from this dataset:
Finally, we can perform the clustering:
Z = ward(y) Z array([[ 0. , 1. , 1. , 2. ], [ 3. , 4. , 1. , 2. ], [ 6. , 7. , 1. , 2. ], [ 9. , 10. , 1. , 2. ], [ 2. , 12. , 1.29099445, 3. ], [ 5. , 13. , 1.29099445, 3. ], [ 8. , 14. , 1.29099445, 3. ], [11. , 15. , 1.29099445, 3. ], [16. , 17. , 5.77350269, 6. ], [18. , 19. , 5.77350269, 6. ], [20. , 21. , 8.16496581, 12. ]])
The linkage matrix Z represents a dendrogram - seescipy.cluster.hierarchy.linkage for a detailed explanation of its contents.
We can use scipy.cluster.hierarchy.fcluster to see to which cluster each initial point would belong given a distance threshold:
fcluster(Z, 0.9, criterion='distance') array([ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12], dtype=int32) fcluster(Z, 1.1, criterion='distance') array([1, 1, 2, 3, 3, 4, 5, 5, 6, 7, 7, 8], dtype=int32) fcluster(Z, 3, criterion='distance') array([1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4], dtype=int32) fcluster(Z, 9, criterion='distance') array([1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1], dtype=int32)
Also, scipy.cluster.hierarchy.dendrogram can be used to generate a plot of the dendrogram.