What is Sparse Categorical Crossentropy (original) (raw)

Last Updated : 26 Jul, 2025

Sparse Categorical Crossentropy is a loss function commonly used in multi-class classification problems in machine learning and deep learning and is particularly used when dealing with a large number of categories. It is very similar to Categorical Crossentropy but with one important difference i.e the true class labels are provided as integers (category indices), not as one-hot encoded vectors.

It is specifically designed for situations where the target labels are provided as integer class indices (e.g., 0, 1, 2, …) rather than one-hot encoded vectors. The term "sparse" refers to this compact label representation which avoids the memory and computational overhead of converting the labels into lengthy one-hot encoded arrays.

**Working of Sparse Categorical Crossentropy

The function can be defined as:

L(y, \hat{y}) = -\sum_{i=1}^C y_i \log(\hat{y}_i)

where:

Sparse categorical cross entropy modifies this by using the integer index of the true class directly. The loss for each sample is:

L(y, \hat{y}) = -\log\left(\hat{y}_y\right)

where:

Implementation of Sparse Categorical Crossentropy

We will see step by step procedure to implement it in python:

Step 1: Import libraries

Here we will load scikit learn and tensorflow for its implementation.

Python `

from sklearn.datasets import load_iris from sklearn.model_selection import train_test_split from sklearn.preprocessing import StandardScaler import tensorflow as tf

`

Step 2: Load and Preprocess the Data

data = load_iris() X, y = data.data, data.target

X = StandardScaler().fit_transform(X)

X_train, X_test, y_train, y_test = train_test_split( X, y, test_size=0.2, random_state=42)

`

Step 3: Build the Neural Network Model

Make a neural network model which has:

model = tf.keras.Sequential([ tf.keras.layers.Dense(16, activation='relu', input_shape=(4,)), tf.keras.layers.Dense(3) ])

`

Step 4: Compile and Train the Model

model.compile(optimizer='adam', loss=tf.keras.losses.SparseCategoricalCrossentropy( from_logits=True), metrics=['accuracy'])

model.fit(X_train, y_train, epochs=20, validation_data=(X_test, y_test))

`

training

Training

Step 5: Make Predictions

import numpy as np

logits = model.predict(X_test) probs = tf.nn.softmax(logits).numpy() preds = np.argmax(probs, axis=1)

print("\nSample Predictions (predicted vs actual):") for i in range(5): print( f"Predicted: {preds[i]}, Actual: {y_test[i]}, Confidence: {np.max(probs[i]):.2f}")

`

**Output:

predictions

Predictions

Application

Advantages

Limitations