F1 Score in Machine Learning (original) (raw)

Last Updated : 30 Mar, 2026

F1 Score is a metric used to evaluate the performance of a classification model. It combines precision and recall into a single value and is especially useful when the dataset has imbalanced classes.

f1_score

F1 Score is a balance between Precision and Recall

Prerequisites

1. Confusion Matrix

A confusion matrix is a table used to evaluate the performance of a classification model. It compares the actual labels with the predicted labels to show how many predictions were correct or incorrect.

2. Precision

Precision measures how many of the positive predictions made by the model are actually correct. It tells us how accurate the model is when it predicts a positive class.

\text{Precision} = \frac{\text{True Positives}}{\text{True Positives} + \text{False Positives}}

**For example: Suppose a model predicts 5 cases as positive. Out of these, 4 are actually positive and 1 is negative. In this case, the precision is 80% (4/5).

3. Recall

Recall, also known as Sensitivity or True Positive Rate, measures how many of the actual positive cases were correctly identified by the model. It focuses on the model’s ability to detect positive instances.

\text{Recall} = \frac{\text{True Positives}}{\text{True Positives} + \text{False Negatives}}

**For example: Suppose there are 10 actual positive cases in the dataset. If the model correctly identifies 4 of them as positive, the recall becomes 40% (4/10). This means the model detected only a portion of the actual positive cases.

Combining Precision and Recall

F1 Score combines precision and recall into a single metric using the harmonic mean. It helps evaluate a model by balancing both precision and recall.

F_1 \text{ Score} = 2 \times \frac{\text{Precision} \times \text{Recall}}{\text{Precision} + \text{Recall}}

The F1 score becomes high only when both precision and recall are high. If either of them decreases significantly, the F1 score will also decrease.

Why Harmonic Mean is Used

The harmonic mean is used instead of a simple average because it balances precision and recall more effectively. It ensures that both values need to be high for the F1 score to be high.

Calculating F1 Score

The F1 Score can be calculated for both binary classification and multiclass classification problems.

1. Binary Classification

In binary classification , there are only two classes: positive and negative. The F1 score is calculated using values from the confusion matrix, which helps determine metrics like precision and recall.

**For example: Consider a dataset with 100 total cases. Out of these, 90 are positive and 10 are negative. The model predicts 85 cases as positive, where 80 are actually positive and 5 are actually negative. The confusion matrix would look like:

Example Actual Total
Model Prediction 80 5 85
10 5 15
Total 90 10 100

From this matrix we can calculate:

  1. **Precision = 80 / 85 = 0.94
  2. **Recall = 80 / 90 = 0.88
  3. **Accuracy = (80 + 5) / 100 = 0.85
  4. **F1 Score = 0.91

This shows that the model performs well because both precision and recall are high.

2. Multiclass Classification

In a multi-class classification , where there are more than two classes, the F1 score is calculated separately for each class instead of using a single score for the whole model. This is commonly done using the One-vs-Rest (OvR) or One-vs-All (OvA) approach. The process works as follows:

Implementing F1 Score in Python

We can easily calculate the F1 score in Python using the f1_score function from the sklearn.metrics module. This function supports both binary and multiclass classification. The f1_score function mainly uses the following parameters:

from sklearn.metrics import f1_score

y_true = [0, 1, 2, 2, 2, 2, 1, 0, 2, 1, 0] y_pred = [0, 0, 2, 2, 1, 2, 1, 0, 1, 2, 1]

f1_per_class = f1_score(y_true, y_pred, average=None) f1_micro = f1_score(y_true, y_pred, average='micro') f1_macro = f1_score(y_true, y_pred, average='macro') f1_weighted = f1_score(y_true, y_pred, average='weighted')

print("F1 score per class:", f1_per_class) print("Micro-average F1 score:", f1_micro) print("Macro-average F1 score:", f1_macro) print("Weighted-average F1 score:", f1_weighted)

`

**Output:

Implementation-of-F1-Score

Implementation of F1 Score