pair_confusion_matrix (original) (raw)
sklearn.metrics.cluster.pair_confusion_matrix(labels_true, labels_pred)[source]#
Pair confusion matrix arising from two clusterings.
The pair confusion matrix \(C\) computes a 2 by 2 similarity matrix between two clusterings by considering all pairs of samples and counting pairs that are assigned into the same or into different clusters under the true and predicted clusterings [1].
Considering a pair of samples that is clustered together a positive pair, then as in binary classification the count of true negatives is\(C_{00}\), false negatives is \(C_{10}\), true positives is\(C_{11}\) and false positives is \(C_{01}\).
Read more in the User Guide.
Parameters:
labels_truearray-like of shape (n_samples,), dtype=integral
Ground truth class labels to be used as a reference.
labels_predarray-like of shape (n_samples,), dtype=integral
Cluster labels to evaluate.
Returns:
Cndarray of shape (2, 2), dtype=np.int64
The contingency matrix.
References
Examples
Perfectly matching labelings have all non-zero entries on the diagonal regardless of actual label values:
from sklearn.metrics.cluster import pair_confusion_matrix pair_confusion_matrix([0, 0, 1, 1], [1, 1, 0, 0]) array([[8, 0], [0, 4]]...
Labelings that assign all classes members to the same clusters are complete but may be not always pure, hence penalized, and have some off-diagonal non-zero entries:
pair_confusion_matrix([0, 0, 1, 2], [0, 0, 1, 1]) array([[8, 2], [0, 2]]...
Note that the matrix is not symmetric.