LayerNorm — PyTorch 2.7 documentation (original) (raw)

class torch.nn.LayerNorm(normalized_shape, eps=1e-05, elementwise_affine=True, bias=True, device=None, dtype=None)[source][source]

Applies Layer Normalization over a mini-batch of inputs.

This layer implements the operation as described in the paper Layer Normalization

y=x−E[x]Var[x]+ϵ∗γ+βy = \frac{x - \mathrm{E}[x]}{ \sqrt{\mathrm{Var}[x] + \epsilon}} * \gamma + \beta

The mean and standard-deviation are calculated over the last D dimensions, where Dis the dimension of normalized_shape. For example, if normalized_shapeis (3, 5) (a 2-dimensional shape), the mean and standard-deviation are computed over the last 2 dimensions of the input (i.e. input.mean((-2, -1))).γ\gamma and β\beta are learnable affine transform parameters ofnormalized_shape if elementwise_affine is True. The variance is calculated via the biased estimator, equivalent totorch.var(input, unbiased=False).

Note

Unlike Batch Normalization and Instance Normalization, which applies scalar scale and bias for each entire channel/plane with theaffine option, Layer Normalization applies per-element scale and bias with elementwise_affine.

This layer uses statistics computed from input data in both training and evaluation modes.

Parameters

Variables

Shape:

Examples:

NLP Example

batch, sentence_length, embedding_dim = 20, 5, 10 embedding = torch.randn(batch, sentence_length, embedding_dim) layer_norm = nn.LayerNorm(embedding_dim)

Activate module

layer_norm(embedding)

Image Example

N, C, H, W = 20, 5, 10, 10 input = torch.randn(N, C, H, W)

Normalize over the last three dimensions (i.e. the channel and spatial dimensions)

as shown in the image below

layer_norm = nn.LayerNorm([C, H, W]) output = layer_norm(input)

../_images/layer_norm.jpg