Categorical CrossEntropy in MultiClass Classification (original) (raw)

Categorical Cross-Entropy in Multi-Class Classification

Last Updated : 25 Nov, 2025

Categorical Cross-Entropy is widely used as a loss function to measure how well a model predicts the correct class in multi-class classification problems. It measures the difference between the predicted probability distribution and the true one-hot encoded labels, guiding the model to assign higher probabilities to the correct class.

nn_layers

Categorical Cross-Entropy

Here we see how neural networks are converted into Softmax probabilities and used in Categorical Cross-Entropy (CCE) to compute loss for the true class.

How Categorical Cross-Entropy Works

Categorical Cross-Entropy measures the difference between the true labels and the predicted probabilities of a model. It penalizes the model when it assigns low confidence to the correct class. Formula is:

L(y, \hat{y}) = - \sum_{i=1}^{c} y_i \log(\hat{y}_i)

where

Categorical Cross-Entropy works through the following steps

Step-By-Step Implementation

Here in this code we will train a neural network on the MNIST dataset using Categorical Cross-Entropy loss for multi-class classification. It allows predicting any test image and displays the probability of each class along with the predicted label.

Step 1: Import Libraries & Load Dataset

Here we will use numpy, tenserflow and matplotlib.

Python `

import numpy as np import tensorflow as tf from tensorflow.keras.datasets import mnist from tensorflow.keras.models import Sequential from tensorflow.keras.layers import Dense, Flatten from tensorflow.keras.utils import to_categorical from tensorflow.keras.losses import CategoricalCrossentropy import matplotlib.pyplot as plt

(X_train, y_train), (X_test, y_test) = mnist.load_data()

`

Step 2: Preprocess Data

X_train = X_train.astype('float32') / 255.0 X_test = X_test.astype('float32') / 255.0 y_train_encoded = to_categorical(y_train, num_classes=10) y_test_encoded = to_categorical(y_test, num_classes=10)

`

Step 3: Build and Compile Model

model = Sequential([ Flatten(input_shape=(28,28)), Dense(128, activation='relu'), Dense(64, activation='relu'), Dense(10, activation='softmax') ])

model.compile(optimizer='adam', loss=CategoricalCrossentropy(), metrics=['accuracy'])

`

Step 4: Train the Model

history = model.fit(X_train, y_train_encoded, epochs=10, batch_size=64, validation_split=0.2)

`

Step 5: Predict and Display Probabilities

def predict_digit(index): img = X_test[index] plt.imshow(img, cmap='gray') plt.title(f"True Label: {y_test[index]}") plt.axis('off') plt.show()

pred_prob = model.predict(img.reshape(1,28,28))[0]
for i, prob in enumerate(pred_prob):
    print(f"Class {i}: {prob:.4f}")
predicted_class = np.argmax(pred_prob)
print(f"\nPredicted Class: {predicted_class}")

`

**Output:

cce1

Output

You can download full code from here.

Categorical Cross-Entropy vs Binary Cross-Entropy

Here we see the difference between Categorical Cross-Entropy and Binary Cross-Entropy:

Parameters Categorical Cross-Entropy Binary Cross-Entropy
Use Case Multi-class classification Binary classification
Label Format One-hot encoded vector Single label
Interpretation Penalizes wrong predictions across all classes Penalizes wrong prediction for the single class
Activation Function Softmax Sigmoid
Output Probability distribution across multiple classes Single probability for positive class

Applications

Advantages

Limitations