How to Calculate Cohen’s Kappa in R (original) (raw)

Last Updated : 23 Jul, 2025

In this article, we will discuss What is Cohen’s Kappa and How to Calculate Cohen’s Kappa in the R Programming Language.

What is Cohen’s Kappa?

Cohen's Kappa is a statistical measure used to assess inter-rater reliability or agreement between two raters when dealing with categorical data. It quantifies the level of agreement between the raters by taking into account the agreement that could be expected by chance alone. It's particularly useful when assessing agreement on subjective judgments or classifications.

Why is Cohen's Kappa Important?

Cohen's Kappa is important because it helps ensure that different people making subjective judgments agree consistently. This is crucial in fields where opinions or classifications vary. By using Cohen's Kappa, researchers or professionals can check if the agreement between raters is real or just by chance. It helps ensure that the data collected is reliable and trustworthy.

The Role of Categorical Agreement

Categorical agreement refers to the degree to which two raters assign the same category or label to a given set of data. In Cohen's Kappa calculation, categorical agreement serves as the foundation for evaluating the level of agreement between the raters. The Kappa statistic compares the observed level of agreement with the level of agreement expected by chance alone.

**The formula for Cohen's Kappa is

k=Po-Pe/1-Pe

Where:

Po is the observed proportion of agreement between the raters.

Pe is the expected proportion of agreement by chance.

Observed Agreement(_Po)

Observed agreement refers to the proportion of cases in which two raters or methods agree on the categorization or classification of items. It represents the actual, observed instances where both raters provide the same classification.

**Example: Suppose two medical professionals independently examine a set of X-ray images and categorize each image as either showing signs of a specific condition or not. If they both agree on the classification for 80 out of 100 X-ray images, then the observed agreement is 80%.

Expected Agreement(_Pe)

Expected agreement is what we would expect to happen by chance. It considers how often they would agree just by guessing, based on the overall probability of each choice.

**Example: In the X-ray example, if the prevalence of the condition in the dataset is 30%, and both raters are assigning categories randomly based on this prevalence, the expected agreement can be calculated. If 30 out of 100 X-ray images are expected to show signs of the condition, and both raters are randomly classifying them, the expected agreement for this category can be determined. This process is repeated for each category.

**Cohen's Kappa ranges from -1 to 1:

By comparing observed and expected agreement, Cohen's Kappa provides a normalized measure of agreement that accounts for the possibility of chance agreement. Categorical agreement is crucial in this context because it forms the basis for understanding the level of agreement between raters, which is then used to calculate Kappa. The Kappa coefficient helps researchers assess the reliability and validity of categorical assignments, taking into account what could be expected due to random chance.

**Interpretation of Cohen's Kappa values

Let's consider a scenario where two doctors are assessing the presence or absence of a specific medical condition (Condition X) in a set of patients. Each patient is either diagnosed as having the condition (Positive) or not having the condition (Negative). The two doctors independently review a sample of 100 patients, and we want to assess the agreement between their diagnoses using Cohen's Kappa.

Photos-24-02-2024-17_04_37

Here,

**Calculation of Cohen's Kappa

**1.Calculate Observed Agreement (Po):

Po = a+d/a+b+c+d

Po = 60+15/60+10+15+15

Po = 75/100

Po = 0.75

**2.Calculate Agreement Expected by Chance (Pe):

Po = (a+b)*(a+c)*(c+d)*(b+d)/(a+b+c+d)2

Po = (60+10)*(60+15)*(15+15)*(10+15)/(60+10+15+15)2

Po = 70*75*30*25/1002

Po = 5250*750/10000

Po = 6000/10000

Po = 0.6

**3.Calculate Cohen's Kappa:

k = Po-Pe/1-Pe

k = 0.75-0.6/1-0.6

k = 0.15/0.4

k = 0.375

Therefore, Cohen's Kappa for the two doctors' diagnoses of Condition X is 0.375. The interpretation of this value would depend on the context, but generally, values above 0.6 are considered substantial agreement. In this case, there is a moderate level of agreement between the two doctors in diagnosing Condition X.

Calculation of Cohen’s Kappa in R

We can calculate Cohen's Kappa in R using functions from packages such as irr (for inter-rater reliability), psych (for psychological statistics) or vcd (Visualizing Categorical Data) package.

Calculate Cohen’s Kappa in R using ****'irr' package**

**Install and Load 'irr' package

R `

install.packages("irr") library(irr)

`

2. Create a matrix or data frame containing the ratings from each rater

R `

ratings <- data.frame( Rater1 = c(1, 2, 3, 2, 1), Rater2 = c(1, 2, 3, 1, 1) )

`

3. Calculate Cohen's Kappa and Print the result

R `

kappa_result <- kappa2(ratings, weight = "unweighted") print(kappa_result)

`

**Output:

Cohen's Kappa for 2 Raters (Weights: unweighted)

Subjects = 5
Raters = 2
Kappa = 0.688

    z = 2.28   

p-value = 0.0224

First Installs and loads the irr package.

Calculate Cohen’s Kappa in R using **vcd package

**1.Install and load the 'vcd' package:

R `

install.packages("vcd") library(vcd)

data <- data.frame( Doctor1 = c(1, 1, 0, 1, 0, 1, 0, 0, 1, 1), Doctor2 = c(1, 1, 1, 1, 0, 1, 0, 0, 1, 0) )

table_data <- table(data$Doctor1, data$Doctor2) print(table_data)

`

**Output:

0 1  

0 3 1
1 1 5

**Use the kappa2 function to calculate Cohen's Kappa

R `

Calculate Cohen's Kappa

kappa_result <- kappa2(table_data) print(kappa_result)

`

**Output:

Cohen's Kappa for 2 Raters (Weights: unweighted)
Subjects = 2
Raters = 2
Kappa = -0.333
z = -1.41
p-value = 0.157

Displays counts of observations for each combination of ratings.

Calculate Cohen’s Kappa in R using psych package

R `

Install and load the 'psych' package

install.packages("psych") library(psych)

Create a matrix or data frame containing the ratings from each rater

ratings <- matrix(c(1, 2, 3, 2, 1, 1, 1, 2, 3, 3), ncol = 2, byrow = TRUE)

Calculate Cohen's Kappa

kappa_result <- cohen.kappa(ratings)

Print the result

print(kappa_result)

`

**Output:

Call: cohen.kappa1(x = x, w = w, n.obs = n.obs, alpha = alpha, levels = levels)

Cohen Kappa and Weighted Kappa correlation coefficients and confidence boundaries
lower estimate upper
unweighted kappa -0.089 0.25 0.59
weighted kappa 0.128 0.57 1.00

Number of subjects = 5

Install and load the "psych" package in R.

**For unweighted kappa:

**For weighted kappa:

For unweighted kappa, the estimate of kappa is 0.25. This suggests slight to fair agreement.

For weighted kappa, the estimate of kappa is 0.57. This indicates moderate to substantial agreement.

Challenges in interpreting Cohen's Kappa

  1. **Subjectivity: Categorical judgments can vary among raters due to subjectivity.
  2. **Small Sample Size: Cohen's Kappa may produce unreliable estimates with small sample sizes, as it relies on observed and expected agreement frequencies.
  3. **Unequal Marginal Distributions: Disproportionate category distributions can skew Kappa estimates.
  4. **Ordinal Categories: SometimesAssumes equal intervals between categories, which may not always hold true.
  5. **Interpretation: Interpreting Kappa values can be tricky, and what's considered acceptable agreement might change based on the situation.
  6. **Sensitivity to Category Definitions: Small changes in category definitions or thresholds can significantly impact Kappa values.
  7. **Dependence on Rater Expertise: Kappa values may vary based on the expertise or training of the raters involved in the assessment.

Conclusion

Cohen's Kappa is a important tool for assessing inter-rater agreement in various fields. By accounting for chance agreement, it provides a more accurate measure of reliability than simple agreement percentages. Applying Cohen's Kappa can enhance the quality and validity of research findings, ensuring consistency and thoroughness in categorical judgments.