Softmax Activation Function in Neural Networks (original) (raw)

Last Updated : 17 Nov, 2025

In Deep Learning, activation functions are important because they introduce non-linearity into neural networks allowing them to learn complex patterns. Softmax Activation Function transforms a vector of numbers into a probability distribution, where each value represents the likelihood of a particular class. It is especially important for multi-class classification problems.

This property makes Softmax ideal for scenarios where each output neuron represents the probability of a distinct class.

Softmax Function

For a given vector, z = [z_1, z_2, \dots, z_n]the Softmax function is defined as:

\sigma(z_i) = \frac{e^{z_i}}{\sum_{j=1}^{n} e^{z_j}}

**where:

Each output \sigma(z_i) represents the probability of class i.

Key Characteristics

How Softmax Activation Function Works

Softmax converts a vector of raw scores into a probability distribution.

Step-By-Step Implementation

Step 1: Import Necessary Libraries

import numpy as np import tensorflow as tf from tensorflow.keras.models import Sequential from tensorflow.keras.layers import Dense from tensorflow.keras.utils import to_categorical from sklearn.datasets import load_iris from sklearn.model_selection import train_test_split import matplotlib.pyplot as plt

`

Step 2: Load and Prepare the Dataset

iris = load_iris() X = iris.data
y = iris.target

y_encoded = to_categorical(y)

X_train, X_test, y_train, y_test = train_test_split(X, y_encoded, test_size=0.2, random_state=42)

`

Step 3: Neural Network Model

model = Sequential([ Dense(8, input_shape=(4,), activation='relu'),
Dense(3, activation='softmax')
])

`

Step 4: Compile the Model

model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])

`

Step 5: Train the Model

history = model.fit(X_train, y_train, epochs=100, batch_size=8, validation_split=0.2, verbose=0)

`

Step 6: Predict and Display Probabilities

sample = np.array([[5.1, 3.5, 1.4, 0.2]])
prediction = model.predict(sample) predicted_class = np.argmax(prediction)

print("\nPredicted Probabilities (Softmax Output):", prediction) print("Predicted Class:", iris.target_names[predicted_class])

`

**Output:

softmax1

Prediction

You can download full code from here.

Why Use Softmax in the Last Layer

The Softmax Activation function is typically used in the final layer of a classification neural network because:

Applications

Challenges

Difference Between Sigmoid and Softmax Activation Function

Sigmoid and Softmax are activation functions used in classification tasks.

Parameters Sigmoid Activation Function Softmax Activation Function
Definition Maps any real valued input to a value between 0 and 1 Converts a vector of real number into a probability distribution
Purpose Used for binary classification problems Used for multi class classification problems
Number of Outputs one independent probability per neuron Multiple interdependent probabilities for all classes
Use Case Predicting two classes Predicting multiple classes
Output Represents confidence for one class Represents probabilities for all classes

Applications

Challenges