AUCROC Curve in Machine Learning (original) (raw)

Last Updated : 30 Apr, 2026

AUC-ROC curve is a graph used to check how well a binary classification model works. It helps us to understand how well the model separates the positive cases like people with a disease from the negative cases like people without the disease at different threshold level. It shows how good the model is at telling the difference between the two classes by plotting:

A better model has a higher AUC (Area Under the Curve), which indicates a stronger ability to distinguish between classes.

111

Sensitivity versus False Positive Rate plot

These terms are derived from the **confusion matrix which provides the following values:

file

Confusion Matrix for a Classification Task

Working of AUC-ROC

AUC-ROC curve helps us understand how well a classification model distinguishes between the two classes. Imagine we have 6 data points and out of these:

AUC-ROC-Curve

ROC-AUC Classification Evaluation Metric

Now the model will give each data point a predicted probability of belonging to Class 1. The AUC measures the model's ability to assign higher predicted probabilities to the positive class than to the negative class. Here’s how it work:

  1. **Randomly choose a pair: Pick one data point from the positive class (Class 1) and one from the negative class (Class 0).
  2. **Check if the positive point has a higher predicted probability: If the model assigns a higher probability to the positive data point than to the negative one for correct ranking.
  3. **Repeat for all pairs: We do this for all possible pairs of positive and negative examples.

When to Use AUC-ROC

AUC-ROC is effective when:

In cases of highly imbalanced datasets AUC-ROC might give overly optimistic results. In such cases the Precision-Recall Curve is more suitable focusing on the positive class.

Model Performance with AUC-ROC:

In short AUC gives you an overall idea of how well your model is doing at sorting positives and negatives, without being affected by the threshold you set for classification. A higher AUC means your model is doing good.

Implementation using two different models

1. Installing Libraries

We will be importing numpy, pandas, matplotlib and scikit learn.

Python `

import numpy as np import pandas as pd import matplotlib.pyplot as plt from sklearn.datasets import make_classification from sklearn.model_selection import train_test_split from sklearn.linear_model import LogisticRegression from sklearn.ensemble import RandomForestClassifier from sklearn.metrics import roc_curve, auc

`

2. Generating data and splitting data

Using an 80-20 split ratio, the algorithm creates artificial binary classification data with 20 features, divides it into training and testing sets and assigns a random seed to ensure reproducibility.

Python `

X, y = make_classification( n_samples=1000, n_features=20, n_classes=2, random_state=42)

X_train, X_test, y_train, y_test = train_test_split( X, y, test_size=0.2, random_state=42)

`

3. Training the different models

To train the Random Forest and Logistic Regression models we use a fixed random seed to get the same results every time we run the code. First we train a logistic regression model using the training data. Then use the same training data and random seed we train a Random Forest model with 100 trees.

Python `

logistic_model = LogisticRegression(random_state=42) logistic_model.fit(X_train, y_train)

random_forest_model = RandomForestClassifier(n_estimators=100, random_state=42) random_forest_model.fit(X_train, y_train)

`

4. Predictions

Using the test data and a trained Logistic Regression model the code predicts the positive class's probability. In a similar manner, using the test data, it uses the trained Random Forest model to produce projected probabilities for the positive class.

Python `

y_pred_logistic = logistic_model.predict_proba(X_test)[:, 1] y_pred_rf = random_forest_model.predict_proba(X_test)[:, 1]

`

5. Creating a dataframe

Using the test data the code creates a DataFrame called test_df with columns labeled "True," "Logistic" and "RandomForest," add true labels and predicted probabilities from Random Forest and Logistic Regression models.

Python `

test_df = pd.DataFrame( {'True': y_test, 'Logistic': y_pred_logistic, 'RandomForest': y_pred_rf})

`

6. Plotting ROC Curve for models

Plot the ROC curve and compute the AUC for both Logistic Regression and Random Forest. The ROC curve compares models based on True Positive Rate vs False Positive Rate, while the red dashed line shows random guessing.

Python `

plt.figure(figsize=(7, 5))

for model in ['Logistic', 'RandomForest']: fpr, tpr, _ = roc_curve(test_df['True'], test_df[model]) roc_auc = auc(fpr, tpr) plt.plot(fpr, tpr, label=f'{model} (AUC = {roc_auc:.2f})')

plt.plot([0, 1], [0, 1], 'r--', label='Random Guess')

plt.xlabel('False Positive Rate') plt.ylabel('True Positive Rate') plt.title('ROC Curves for Two Models') plt.legend() plt.show()

`

**Output:

roc-Geeksforgeeks

The plot computes the AUC and ROC curve for each model i.e Random Forest and Logistic Regression, then plots the ROC curve. The ROC curve for random guessing is also represented by a red dashed line and labels, a title and a legend are set for visualization.

AUC-ROC for a Multi-Class Model

For multiclass classification, AUC-ROC is extended using the One-vs-All (OvA) approach. Each class is treated as the positive class once, and the remaining classes are grouped as the negative class. For example, if you have classes A, B, C, D, you will get four ROC curves one for each class:

Steps to Use AUC-ROC for Multiclass Models

**1. One-vs-All Conversion: Treat each class as the positive class and all others combined as the negative class.

**2. Train a Binary Classifier per Class: Fit the model separately for each class-vs-rest combination.

**3. Compute AUC-ROC for Each Class:

**4. Compare Performance: A higher AUC score means the model is better at distinguishing that class from the others.

Implementation of AUC-ROC in Multiclass Classification

1. Importing Libraries

The program creates artificial multiclass data, divides it into training and testing sets and then uses the One-vs-Restclassifier technique to train classifiers for both Random Forest and Logistic Regression. It plots the two models multiclass ROC curves to demonstrate how well they discriminate between various classes.

Python `

import numpy as np import matplotlib.pyplot as plt from sklearn.datasets import make_classification from sklearn.model_selection import train_test_split from sklearn.preprocessing import label_binarize from sklearn.multiclass import OneVsRestClassifier from sklearn.linear_model import LogisticRegression from sklearn.ensemble import RandomForestClassifier from sklearn.metrics import roc_curve, auc from itertools import cycle

`

2. Generating Data and splitting

Three classes and twenty features make up the synthetic multiclass data produced by the code. After label binarization, the data is divided into training and testing sets in an 80-20 ratio.

Python `

X, y = make_classification( n_samples=1000, n_features=20, n_classes=3, n_informative=10, random_state=42)

y_bin = label_binarize(y, classes=np.unique(y))

X_train, X_test, y_train, y_test = train_test_split( X, y_bin, test_size=0.2, random_state=42)

`

3. Training Models

The program trains two multiclass models i.e a Random Forest model with 100 estimators and a Logistic Regression model with the One-vs-Rest approach. With the training set of data both models are fitted.

Python `

logistic_model = OneVsRestClassifier(LogisticRegression(random_state=42)) logistic_model.fit(X_train, y_train)

rf_model = OneVsRestClassifier( RandomForestClassifier(n_estimators=100, random_state=42)) rf_model.fit(X_train, y_train)

`

4. Plotting the AUC-ROC Curve

The ROC curves and AUC scores for each class are computed and plotted for both models. A dashed line indicates random guessing, helping visualize how well each model separates multiple classes.

Python `

fpr = dict() tpr = dict() roc_auc = dict()

models = [logistic_model, rf_model]

plt.figure(figsize=(6, 5)) colors = cycle(['aqua', 'darkorange'])

for model, color in zip(models, colors): for i in range(model.classes_.shape[0]): fpr[i], tpr[i], _ = roc_curve( y_test[:, i], model.predict_proba(X_test)[:, i]) roc_auc[i] = auc(fpr[i], tpr[i]) plt.plot(fpr[i], tpr[i], color=color, lw=2, label=f'{model.class.name} - Class {i} (AUC = {roc_auc[i]:.2f})')

plt.plot([0, 1], [0, 1], 'k--', lw=2, label='Random Guess')

plt.xlabel('False Positive Rate') plt.ylabel('True Positive Rate') plt.title('Multiclass ROC Curve with Logistic Regression and Random Forest') plt.legend(loc="lower right") plt.show()

`

**Output:

multi-Geeksforgeeks