Learning Rate in Neural Network (original) (raw)

Last Updated : 12 May, 2026

The learning rate is a key hyperparameter that controls how quickly a model learns by determining the step size during weight updates.

**Formula

w = w - \alpha \cdot \nabla L(w)

Where:

**Impact of Learning Rate on Model

The learning rate directly influences how fast and how well a model learns by controlling the size of weight updates during training.

A low learning rate leads to slow convergence, requires more epochs and increases computation time but can improve accuracy
A high learning rate speeds up training but may overshoot optimal values and cause instability or divergence
An optimal learning rate balances speed and accuracy, ensuring stable convergence
Fine-tuning the learning rate is important for better performance
Techniques like learning rate scheduling and adaptive optimizers help improve stability and efficiency

A constant learning rate is maintained throughout training.
Simple to implement and commonly used in basic models.
Its limitation is that it lacks the ability to adapt on different training phases which may create sub optimal results.

These techniques reduce the learning rate over time based on predefined rules to improve convergence:

**Step Decay: Reduces the learning rate by a fixed factor at set intervals (every few epochs).
**Exponential Decay: Continuously decreases the learning rate exponentially over training time.
**Polynomial Decay: Learning rate decays polynomially, offering smoother transitions compared to step or exponential methods.

Adaptive methods adjust the learning rate dynamically based on gradient information, allowing better updates per parameter:

**AdaGrad: AdaGrad adapts the learning rate per parameter based on the squared gradients. It is effective for sparse data but may decay too quickly.
**RMSprop: RMSprop builds on AdaGrad by using a moving average of squared gradients to prevent aggressive decay.
**Adam (Adaptive Moment Estimation): Adam combines RMSprop with momentum to provide stable and fast convergence; widely used in practice.

The learning rate oscillates between a minimum and maximum value in a cyclic manner throughout training.
It increases and then decreases the learning rate linearly in each cycle.
Benefits include better exploration of the loss surface and leading to faster convergence.

Gradually reduces the learning rate as training progresses.
Helps the model take more precise steps towards the minimum. This improves stability in later epochs.