Leaky Relu Activation Function in Deep Learning (original) (raw)

Last Updated : 12 Jul, 2025

The ReLU (Rectified Linear Unit) is one of the most commonly used activation functions in neural networks due to its simplicity and efficiency. It is defined as:

f(x)=max⁡(0,x)

This means it ranges from [0, ∞) i.e for any input value x, it returns x if it is positive and 0 if it is negative. But this approach causes some issues.

Limitations of ReLU

While ReLU is widely adopted it comes with some drawbacks especially during training deep networks:

**Dead Neurons: If a neuron receives only negative inputs, it outputs zero and its gradient becomes zero. This means it stops learning.
**Non-symmetric: ReLU does not treat negative and positive values equally which can slow down learning in some cases.
**Exploding Activations: In some cases the activation values can become too large for large positive inputs.

To overcome these limitations leaky relu activation function was introduced.

What is Leaky ReLU?

Leaky ReLU is a modified version of ReLU designed to fix the problem of dead neurons. Instead of returning zero for negative inputs it allows a small, non-zero value. It introduces a slight modification to the standard ReLU by assigning a small, fixed slope to the negative part of the input. This ensures that neurons don't become inactive during training as they can still pass small gradients even when receiving negative values.

Its equation is:

\text{Leaky ReLU}(x) =\begin{cases}x, & \text{if } x > 0 \\0.01 \cdot x, & \text{if } x \leq 0\end{cases}

and its graph is:

**X-axis: Input values to the activation function ranging from -5 to +2.
**Y-axis: Output values of the Leaky ReLU function.

**Interpretation Leaky ReLU Graph

**For positive values of x (x > 0): The function behaves like the standard ReLU. The output increases linearly, following the equation f(x) = x, resulting in a straight line with a slope of 1.
**For negative values of x (x < 0): Unlike ReLU, which outputs 0, Leaky ReLU allows a small negative slope. The function is defined as f(x) = αx, where **α is a small constant (e.g., 0.01). This gives a slight upward slope for negative inputs, preventing the "dying neuron" problem.

**Uses of Leaky ReLU

Prevents dead neurons by allowing a small gradient for negative inputs.
Improves gradient flow during backpropagation.
Helps in faster and more stable training compared to ReLU.
Useful in deep networks where ReLU may fail.

Difference Between ReLU and Leaky ReLU

Feature	ReLU	Leaky ReLU
Output for x > 0	x	x
Output for x < 0	0	\alphax (small negative value)
Slope for x < 0	0	Small constant (e.g., 0.01)
Risk of Dead Neurons	High	Low
Learning Capability	Stops for negative inputs.	Continues learning for all inputs.
Complexity	Very Simple	Slightly more complex.

While not always necessary switching to Leaky ReLU can improve learning dynamics and lead to better model performance where ReLU fails to activate certain neurons.