Activation Functions in Neural Networks Using R (original) (raw)

Last Updated : 23 Jul, 2025

Activation functions are essential components of neural networks that play a crucial role in determining how a model processes and interprets data. They introduce non-linearity into the network, enabling it to learn and capture complex patterns and relationships within the data. By applying mathematical transformations to the input values, activation functions help the network make more accurate predictions and classifications. In neural networks implemented in R, activation functions can be easily utilized to enhance the model’s capability, making them a fundamental aspect of machine learning and deep learning projects.

What Are Activation Functions?

An activation function is a mathematical formula that decides whether a neuron should be activated or not. It takes the input to a neuron, performs a calculation, and then produces an output. This output is then passed to the next layer of the network. Activation functions add non-linearity to the model, allowing it to learn and perform more complex tasks.

Why Are Activation Functions Important?

Without activation functions, the network would only be able to learn linear relationships. Activation functions help introduce non-linearity, which is crucial for handling more complex problems.
By adding non-linearity, activation functions allow the network to learn and make sense of complex patterns in the data, improving its ability to make accurate predictions.
Activation functions can control the range of output values, which can be useful for different types of problems (e.g., classification vs. regression).

Common Activation Functions

Now we will discuss different types of Activation Functions in R Programming Language.

1. Sigmoid (Logistic) Function

The sigmoid function maps any real-valued number into the range of [0, 1]. It’s often used in binary classification problems.

The sigmoid function, denoted as σ(x), maps any real-valued variety to the range [0, 1].
Sigmoid is frequently utilized in binary class problems as it squashes the output into a opportunity-like value.
However, it suffers from the vanishing gradient trouble, that could gradual down education.

**Formula:

( f(x) = \frac{1}{1 + e^{-x}} )

R `

Sigmoid Activation Function

sigmoid <- function(x) { return(1 / (1 + exp(-x))) }

Example usage

x <- c(-1, 0, 1, 2, 3) cat("Sigmoid: ", sigmoid(x), "\n")

**Output:

Sigmoid: 0.2689414 0.5 0.7310586 0.8807971 0.9525741

2. Hyperbolic Tangent (tanh) Function

The tanh function maps input values to the range [-1, 1]. It’s zero-centered, which often leads to faster convergence in training than the sigmoid function.

The tanh function also maps enter values to a selected range, however this time it’s [-1, 1].
It’s zero-targeted, which allows with quicker convergence throughout training.

**Formula:

( f(x) = \tanh(x) )

R `

Hyperbolic Tangent (tanh) Function

tanh <- function(x) { return(2 / (1 + exp(-2 * x)) - 1) }

Example usage

x <- c(-1, 0, 1, 2, 3) cat("tanh: ", tanh(x), "\n")

**Output:

tanh: -0.7615942 0 0.7615942 0.9640276 0.9950548

3. Rectified Linear Unit (ReLU)

ReLU is one of the most popular activation functions in deep learning. It maps all negative inputs to zero, while positive inputs remain unchanged.

ReLU is widely used and computationally green.
It replaces poor inputs with zero and leaves tremendous inputs unchanged.
ReLU helps mitigate the vanishing gradient hassle and quickens schooling.
However, it could suffer from the “dying ReLU” issue (in which some neurons continue to be inactive).

**Formula:

( f(x) = \max(0, x) )

R `

Rectified Linear Unit (ReLU)

relu <- function(x) { return(pmax(0, x)) }

Example usage

x <- c(-1, 0, 1, 2, 3) cat("ReLU: ", relu(x), "\n")

**Output:

ReLU: 0 0 1 2 3

4. Leaky ReLU

A variation of ReLU that allows a small, non-zero gradient when the input is negative.

Leaky ReLU is a variant of ReLU that addresses the death ReLU hassle.
It permits a small, non-zero gradient for bad inputs.

**Formula:

Leaky ReLU(x)=max(αx,x)

R `

Leaky ReLU

leaky_relu <- function(x, alpha = 0.01) { return(ifelse(x > 0, x, alpha * x)) }

Example usage

x <- c(-1, 0, 1, 2, 3) cat("Leaky ReLU: ", leaky_relu(x), "\n")

**Output:

Leaky ReLU: -0.01 0 1 2 3

5. Softmax

Often used in the output layer of a network for multi-class classification problems, the softmax function converts logits (raw model outputs) into probabilities.

Softmax is on the whole used within the output layer for multi-class classification.
Given a set of raw model outputs (logits), softmax converts them into probabilities.
It guarantees that the sum of probabilities throughout all lessons equals 1.

**Formula:

( f(x_i) = \frac{e^{x_i}}{\sum_{j} e^{x_j}} )

R `

Softmax Function

softmax <- function(x) { exp_x <- exp(x) return(exp_x / sum(exp_x)) }

Example usage

x <- c(-1, 0, 1, 2, 3) cat("Softmax: ", softmax(x), "\n")

**Output:

Softmax: 0.01165623 0.03168492 0.08612854 0.2341217 0.6364086

Choosing the Right Activation Function

Choosing the right activation characteristic depends on different factors such as:

**Sigmoid and Tanh: Often used inside the output layer for binary class (sigmoid) or in hidden layers for models wherein the output is between -1 and 1.
**ReLU: Generally used in hidden layers for deep networks, as it has a tendency to mitigate the vanishing gradient problem and hurries up gaining knowledge of.
**Leaky ReLU: Used when ReLU causes dead neurons (neurons that handiest output 0). It’s a very good alternative while going through this trouble.
**Softmax: Ideal for multi-class class troubles, where the output layer needs to symbolize probabilities across more than one instructions.

Conclusion

Activation functions are crucial for making neural networks capable of learning complex patterns. By introducing non-linearity into the model, they allow the network to perform a wide range of tasks, from classification to regression. In R, we can easily implement these functions using packages like neuralnet, making it straightforward to build and train neural networks. Understanding and choosing the right activation function for our problem can greatly impact the performance of our model.