leaders — SciPy v1.15.2 Manual (original) (raw)

scipy.cluster.hierarchy.

scipy.cluster.hierarchy.leaders(Z, T)[source]#

Return the root nodes in a hierarchical clustering.

Returns the root nodes in a hierarchical clustering corresponding to a cut defined by a flat cluster assignment vector T. See the fcluster function for more information on the format of T.

For each flat cluster \(j\) of the \(k\) flat clusters represented in the n-sized flat cluster assignment vector T, this function finds the lowest cluster node \(i\) in the linkage tree Z, such that:

Parameters:

Zndarray

The hierarchical clustering encoded as a matrix. Seelinkage for more information.

Tndarray

The flat cluster assignment vector.

Returns:

Lndarray

The leader linkage node id’s stored as a k-element 1-D array, where k is the number of flat clusters found in T.

L[j]=i is the linkage cluster node id that is the leader of flat cluster with id M[j]. If i < n, icorresponds to an original observation, otherwise it corresponds to a non-singleton cluster.

Mndarray

The leader linkage node id’s stored as a k-element 1-D array, wherek is the number of flat clusters found in T. This allows the set of flat cluster ids to be any arbitrary set of k integers.

For example: if L[3]=2 and M[3]=8, the flat cluster with id 8’s leader is linkage node 2.

See also

fcluster

for the creation of flat cluster assignments.

Examples

from scipy.cluster.hierarchy import ward, fcluster, leaders from scipy.spatial.distance import pdist

Given a linkage matrix Z - obtained after apply a clustering method to a dataset X - and a flat cluster assignment array T:

X = [[0, 0], [0, 1], [1, 0], ... [0, 4], [0, 3], [1, 4], ... [4, 0], [3, 0], [4, 1], ... [4, 4], [3, 4], [4, 3]]

Z = ward(pdist(X)) Z array([[ 0. , 1. , 1. , 2. ], [ 3. , 4. , 1. , 2. ], [ 6. , 7. , 1. , 2. ], [ 9. , 10. , 1. , 2. ], [ 2. , 12. , 1.29099445, 3. ], [ 5. , 13. , 1.29099445, 3. ], [ 8. , 14. , 1.29099445, 3. ], [11. , 15. , 1.29099445, 3. ], [16. , 17. , 5.77350269, 6. ], [18. , 19. , 5.77350269, 6. ], [20. , 21. , 8.16496581, 12. ]])

T = fcluster(Z, 3, criterion='distance') T array([1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4], dtype=int32)

scipy.cluster.hierarchy.leaders returns the indices of the nodes in the dendrogram that are the leaders of each flat cluster:

L, M = leaders(Z, T) L array([16, 17, 18, 19], dtype=int32)

(remember that indices 0-11 point to the 12 data points in X, whereas indices 12-22 point to the 11 rows of Z)

scipy.cluster.hierarchy.leaders also returns the indices of the flat clusters in T:

M array([1, 2, 3, 4], dtype=int32)