torch.nn.init (original) (raw)

Created On: Jun 11, 2019 | Last Updated On: Jul 07, 2022

Warning

All the functions in this module are intended to be used to initialize neural network parameters, so they all run in torch.no_grad() mode and will not be taken into account by autograd.

torch.nn.init.calculate_gain(nonlinearity, param=None)[source]#

Return the recommended gain value for the given nonlinearity function.

The values are as follows:

Warning

In order to implement Self-Normalizing Neural Networks , you should use nonlinearity='linear' instead of nonlinearity='selu'. This gives the initial weights a variance of 1 / N, which is necessary to induce a stable fixed point in the forward pass. In contrast, the default gain for SELU sacrifices the normalization effect for more stable gradient flow in rectangular layers.

Parameters

Return type

float

Examples

gain = nn.init.calculate_gain( ... "leaky_relu", 0.2 ... ) # leaky_relu with negative_slope=0.2

torch.nn.init.uniform_(tensor, a=0.0, b=1.0, generator=None)[source]#

Fill the input Tensor with values drawn from the uniform distribution.

U(a,b)\mathcal{U}(a, b).

Parameters

Return type

Tensor

Examples

w = torch.empty(3, 5) nn.init.uniform_(w)

torch.nn.init.normal_(tensor, mean=0.0, std=1.0, generator=None)[source]#

Fill the input Tensor with values drawn from the normal distribution.

N(mean,std2)\mathcal{N}(\text{mean}, \text{std}^2).

Parameters

Return type

Tensor

Examples

w = torch.empty(3, 5) nn.init.normal_(w)

torch.nn.init.constant_(tensor, val)[source]#

Fill the input Tensor with the value val\text{val}.

Parameters

Return type

Tensor

Examples

w = torch.empty(3, 5) nn.init.constant_(w, 0.3)

torch.nn.init.ones_(tensor)[source]#

Fill the input Tensor with the scalar value 1.

Parameters

tensor (Tensor) – an n-dimensional torch.Tensor

Return type

Tensor

Examples

w = torch.empty(3, 5) nn.init.ones_(w)

torch.nn.init.zeros_(tensor)[source]#

Fill the input Tensor with the scalar value 0.

Parameters

tensor (Tensor) – an n-dimensional torch.Tensor

Return type

Tensor

Examples

w = torch.empty(3, 5) nn.init.zeros_(w)

torch.nn.init.eye_(tensor)[source]#

Fill the 2-dimensional input Tensor with the identity matrix.

Preserves the identity of the inputs in Linear layers, where as many inputs are preserved as possible.

Parameters

tensor (Tensor) – a 2-dimensional torch.Tensor

Return type

Tensor

Examples

w = torch.empty(3, 5) nn.init.eye_(w)

torch.nn.init.dirac_(tensor, groups=1)[source]#

Fill the {3, 4, 5}-dimensional input Tensor with the Dirac delta function.

Preserves the identity of the inputs in Convolutionallayers, where as many input channels are preserved as possible. In case of groups>1, each group of channels preserves identity

Parameters

Return type

Tensor

Examples

w = torch.empty(3, 16, 5, 5) nn.init.dirac_(w) w = torch.empty(3, 24, 5, 5) nn.init.dirac_(w, 3)

torch.nn.init.xavier_uniform_(tensor, gain=1.0, generator=None)[source]#

Fill the input Tensor with values using a Xavier uniform distribution.

The method is described in Understanding the difficulty of training deep feedforward neural networks - Glorot, X. & Bengio, Y. (2010). The resulting tensor will have values sampled fromU(−a,a)\mathcal{U}(-a, a) where

a=gain×6fan_in+fan_outa = \text{gain} \times \sqrt{\frac{6}{\text{fan\_in} + \text{fan\_out}}}

Also known as Glorot initialization.

Parameters

Return type

Tensor

Examples

w = torch.empty(3, 5) nn.init.xavier_uniform_(w, gain=nn.init.calculate_gain("relu"))

torch.nn.init.xavier_normal_(tensor, gain=1.0, generator=None)[source]#

Fill the input Tensor with values using a Xavier normal distribution.

The method is described in Understanding the difficulty of training deep feedforward neural networks - Glorot, X. & Bengio, Y. (2010). The resulting tensor will have values sampled from N(0,std2)\mathcal{N}(0, \text{std}^2) where

std=gain×2fan_in+fan_out\text{std} = \text{gain} \times \sqrt{\frac{2}{\text{fan\_in} + \text{fan\_out}}}

Also known as Glorot initialization.

Parameters

Return type

Tensor

Examples

w = torch.empty(3, 5) nn.init.xavier_normal_(w)

torch.nn.init.kaiming_uniform_(tensor, a=0, mode='fan_in', nonlinearity='leaky_relu', generator=None)[source]#

Fill the input Tensor with values using a Kaiming uniform distribution.

The method is described in Delving deep into rectifiers: Surpassing human-level performance on ImageNet classification - He, K. et al. (2015). The resulting tensor will have values sampled fromU(−bound,bound)\mathcal{U}(-\text{bound}, \text{bound}) where

bound=gain×3fan_mode\text{bound} = \text{gain} \times \sqrt{\frac{3}{\text{fan\_mode}}}

Also known as He initialization.

Parameters

Return type

Tensor

Examples

w = torch.empty(3, 5) nn.init.kaiming_uniform_(w, mode="fan_in", nonlinearity="relu")

Note

Be aware that fan_in and fan_out are calculated assuming that the weight matrix is used in a transposed manner, (i.e., x @ w.T in Linear layers, where w.shape = [fan_out, fan_in]). This is important for correct initialization. If you plan to use x @ w, where w.shape = [fan_in, fan_out], pass in a transposed weight matrix, i.e. nn.init.kaiming_uniform_(w.T, ...).

torch.nn.init.kaiming_normal_(tensor, a=0, mode='fan_in', nonlinearity='leaky_relu', generator=None)[source]#

Fill the input Tensor with values using a Kaiming normal distribution.

The method is described in Delving deep into rectifiers: Surpassing human-level performance on ImageNet classification - He, K. et al. (2015). The resulting tensor will have values sampled fromN(0,std2)\mathcal{N}(0, \text{std}^2) where

std=gainfan_mode\text{std} = \frac{\text{gain}}{\sqrt{\text{fan\_mode}}}

Also known as He initialization.

Parameters

Return type

Tensor

Examples

w = torch.empty(3, 5) nn.init.kaiming_normal_(w, mode="fan_out", nonlinearity="relu")

Note

Be aware that fan_in and fan_out are calculated assuming that the weight matrix is used in a transposed manner, (i.e., x @ w.T in Linear layers, where w.shape = [fan_out, fan_in]). This is important for correct initialization. If you plan to use x @ w, where w.shape = [fan_in, fan_out], pass in a transposed weight matrix, i.e. nn.init.kaiming_normal_(w.T, ...).

torch.nn.init.trunc_normal_(tensor, mean=0.0, std=1.0, a=-2.0, b=2.0, generator=None)[source]#

Fill the input Tensor with values drawn from a truncated normal distribution.

The values are effectively drawn from the normal distribution N(mean,std2)\mathcal{N}(\text{mean}, \text{std}^2)with values outside [a,b][a, b] redrawn until they are within the bounds. The method used for generating the random values works best when a≤mean≤ba \leq \text{mean} \leq b.

Parameters

Return type

Tensor

Examples

w = torch.empty(3, 5) nn.init.trunc_normal_(w)

torch.nn.init.orthogonal_(tensor, gain=1, generator=None)[source]#

Fill the input Tensor with a (semi) orthogonal matrix.

Described in Exact solutions to the nonlinear dynamics of learning in deep linear neural networks - Saxe, A. et al. (2013). The input tensor must have at least 2 dimensions, and for tensors with more than 2 dimensions the trailing dimensions are flattened.

Parameters

Return type

Tensor

Examples

w = torch.empty(3, 5) nn.init.orthogonal_(w)

torch.nn.init.sparse_(tensor, sparsity, std=0.01, generator=None)[source]#

Fill the 2D input Tensor as a sparse matrix.

The non-zero elements will be drawn from the normal distributionN(0,0.01)\mathcal{N}(0, 0.01), as described in Deep learning via Hessian-free optimization - Martens, J. (2010).

Parameters

Return type

Tensor

Examples

w = torch.empty(3, 5) nn.init.sparse_(w, sparsity=0.1)