sklearn.metrics.cohen_kappa_score — scikit-learn 0.20.4 documentation (original) (raw)

sklearn.metrics. cohen_kappa_score(y1, y2, labels=None, weights=None, sample_weight=None)[source]¶

Cohen’s kappa: a statistic that measures inter-annotator agreement.

This function computes Cohen’s kappa [1], a score that expresses the level of agreement between two annotators on a classification problem. It is defined as

\[\kappa = (p_o - p_e) / (1 - p_e)\]

where \(p_o\) is the empirical probability of agreement on the label assigned to any sample (the observed agreement ratio), and \(p_e\) is the expected agreement when both annotators assign labels randomly.\(p_e\) is estimated using a per-annotator empirical prior over the class labels [2].

Read more in the User Guide.

Parameters:	y1 : array, shape = [n_samples] Labels assigned by the first annotator. y2 : array, shape = [n_samples] Labels assigned by the second annotator. The kappa statistic is symmetric, so swapping y1 and y2 doesn’t change the value. labels : array, shape = [n_classes], optional List of labels to index the matrix. This may be used to select a subset of labels. If None, all labels that appear at least once iny1 or y2 are used. weights : str, optional List of weighting type to calculate the score. None means no weighted; “linear” means linear weighted; “quadratic” means quadratic weighted. sample_weight : array-like of shape = [n_samples], optional Sample weights.
Returns:	kappa : float The kappa statistic, which is a number between -1 and 1. The maximum value means complete agreement; zero or lower means chance agreement.

Parameters:

y1 : array, shape = [n_samples] Labels assigned by the first annotator. y2 : array, shape = [n_samples] Labels assigned by the second annotator. The kappa statistic is symmetric, so swapping y1 and y2 doesn’t change the value. labels : array, shape = [n_classes], optional List of labels to index the matrix. This may be used to select a subset of labels. If None, all labels that appear at least once iny1 or y2 are used. weights : str, optional List of weighting type to calculate the score. None means no weighted; “linear” means linear weighted; “quadratic” means quadratic weighted. sample_weight : array-like of shape = [n_samples], optional Sample weights.

Returns:

kappa : float The kappa statistic, which is a number between -1 and 1. The maximum value means complete agreement; zero or lower means chance agreement.

References

[1]	(1, 2) J. Cohen (1960). “A coefficient of agreement for nominal scales”. Educational and Psychological Measurement 20(1):37-46. doi:10.1177/001316446002000104.

[2]	(1, 2) R. Artstein and M. Poesio (2008). “Inter-coder agreement for computational linguistics”. Computational Linguistics 34(4):555-596.

[3]	Wikipedia entry for the Cohen’s kappa.