sklearn.metrics.homogeneity_score — scikit-learn 0.20.4 documentation (original) (raw)

sklearn.metrics. homogeneity_score(labels_true, labels_pred)[source]

Homogeneity metric of a cluster labeling given a ground truth.

A clustering result satisfies homogeneity if all of its clusters contain only data points which are members of a single class.

This metric is independent of the absolute values of the labels: a permutation of the class or cluster label values won’t change the score value in any way.

This metric is not symmetric: switching label_true with label_predwill return the completeness_score which will be different in general.

Read more in the User Guide.

Parameters: labels_true : int array, shape = [n_samples] ground truth class labels to be used as a reference labels_pred : array, shape = [n_samples] cluster labels to evaluate
Returns: homogeneity : float score between 0.0 and 1.0. 1.0 stands for perfectly homogeneous labeling

References

[1] Andrew Rosenberg and Julia Hirschberg, 2007. V-Measure: A conditional entropy-based external cluster evaluation measure

Examples

Perfect labelings are homogeneous:

from sklearn.metrics.cluster import homogeneity_score homogeneity_score([0, 0, 1, 1], [1, 1, 0, 0]) 1.0

Non-perfect labelings that further split classes into more clusters can be perfectly homogeneous:

print("%.6f" % homogeneity_score([0, 0, 1, 1], [0, 0, 1, 2])) ...
1.000000 print("%.6f" % homogeneity_score([0, 0, 1, 1], [0, 1, 2, 3])) ...
1.000000

Clusters that include samples from different classes do not make for an homogeneous labeling:

print("%.6f" % homogeneity_score([0, 0, 1, 1], [0, 1, 0, 1])) ...
0.0... print("%.6f" % homogeneity_score([0, 0, 1, 1], [0, 0, 0, 0])) ...
0.0...

Examples using sklearn.metrics.homogeneity_score