How to plot ROC curve in Python (original) (raw)

Last Updated : 23 Jul, 2025

The Receiver Operating Characteristic (ROC) curve is a fundamental tool in the field of machine learning for evaluating the performance of classification models. In this context, we'll explore the ROC curve and its associated metrics using the breast cancer dataset, a widely used dataset for binary classification tasks.

**What is the ROC Curve?

The ROC curve stands for **Receiver Operating Characteristics Curve and is an evaluation metric for classification tasks and it is a probability curve that plots sensitivity and specificity. So, we can say that the ROC Curve can also be defined as the evaluation metric that plots the sensitivity against the false positive rate. The ROC curve plots two different parameters given below:

The ROC Curve can also defined as a graphical representation that shows the performance or behavior of a classification model at all different threshold levels. The ROC Curve is a tool used for binary classification in machine learning. While learning about the ROC Curve we need to be familiar with the terms specificity and sensitivity.

TPR = TP/(TP+FN)

where, TPR = True positive rate, TP = True positive, FN = False negative.

**False positive rate: On the other side, false positive rate can be defined as the rate of negative instances that were predicted incorrectly to be positive. In other terms, the false positive can also be called "1-specificity".

FPR=FP/(FP+TN)

where, FPR = False positive rate, FP= False positive, TN= True negative.

The ROC Curve is often comparable with the precision and recall curve but it is different because it plots the true positive rate (which is also called recall) against the false positive rate.

The curve is plotted by finding the values of TPR and FPR at distinct threshold values and we don't plot the probabilities but we plot the scores. So the probability of the positive class is taken as the score here.

**Types of ROC Curve

There are two types of ROC Curves:

We need to evaluate a **logistic regression model with distinct classification thresholds to find the points to plot on the ROC curve as the Logistic regression model is a very common model used in binary classification tasks.

**ROC Curve in Python

Let's implement roc curve in python using breast cancer in-built dataset. The breast cancer dataset is a commonly used dataset in machine learning, for binary classification tasks.

**Step 1: Importing the required libraries

In scikit-learn, the roc_curve function is used to compute Receiver Operating Characteristic (ROC) curve points. On the other hand, the auc function calculates the Area Under the Curve (AUC) from the ROC curve.

AUC is a scalar value representing the area under the ROC curve quantifing the classifier's ability to distinguish between positive and negative examples across all possible classification thresholds.

Python3 `

import matplotlib.pyplot as plt from sklearn.datasets import load_breast_cancer from sklearn.model_selection import train_test_split from sklearn.linear_model import LogisticRegression from sklearn.metrics import roc_curve, auc

`

**Step 2: Loading the dataset

Python3 `

data = load_breast_cancer() X = data.data y = data.target # Split the data into features (X) and target variable (y) X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=42)

`

**Step 3: Training and testing the model

Python3 `

Train a logistic regression model

model = LogisticRegression() model.fit(X_train, y_train)

Predict probabilities on the test set

y_pred_proba = model.predict_proba(X_test)[:, 1]

`

**Step 4: Plot the ROC Curve

Calculate ROC curve

fpr, tpr, thresholds = roc_curve(y_test, y_pred_proba) roc_auc = auc(fpr, tpr)

Plot the ROC curve

plt.figure()
plt.plot(fpr, tpr, label='ROC curve (area = %0.2f)' % roc_auc) plt.plot([0, 1], [0, 1], 'k--', label='No Skill') plt.xlim([0.0, 1.0]) plt.ylim([0.0, 1.05]) plt.xlabel('False Positive Rate') plt.ylabel('True Positive Rate') plt.title('ROC Curve for Breast Cancer Classification') plt.legend() plt.show()

`

**Output:

roc

The dashed line represents the ROC Curve for the classifier. The AUC as 1.00, signifies perfect classification, meaning the model can distinguish malignant from benign tumors flawlessly at any threshold.

How Ideal Curve looks like?

An ideal ROC curve would be as close as possible to the upper left corner of the plot, indicating high TPR (correctly identifying true positives) with low FPR (incorrectly identifying false positives). The closer the curve is to the diagonal baseline, the worse the classifier's performance.

The AUC score provides a quantitative measure of the classifier's performance, with a value of 1 indicating perfect classification and a value of 0.5 indicating no better than random guessing.

**Advantages of ROC Curve

**Disadvantages of ROC Curve

**Conclusion

The ROC Curve is an analytical tool used in classification tasks that plots the true positive rate and false positive rate. It is also considered to be the best diagnostic test method as it shows the best cut-off value for diagnostic performance.