torch.nn.init — PyTorch 2.0 documentation (original) (raw)
Warning
All the functions in this module are intended to be used to initialize neural network parameters, so they all run in torch.no_grad() mode and will not be taken into account by autograd.
torch.nn.init.calculate_gain(nonlinearity, param=None)[source]¶
Return the recommended gain value for the given nonlinearity function. The values are as follows:
| nonlinearity | gain |
|---|---|
| Linear / Identity | 11 |
| Conv{1,2,3}D | 11 |
| Sigmoid | 11 |
| Tanh | 53\frac{5}{3} |
| ReLU | 2\sqrt{2} |
| Leaky Relu | 21+negative_slope2\sqrt{\frac{2}{1 + \text{negative\_slope}^2}} |
| SELU | 34\frac{3}{4} |
Warning
In order to implement Self-Normalizing Neural Networks , you should use nonlinearity='linear' instead of nonlinearity='selu'. This gives the initial weights a variance of 1 / N, which is necessary to induce a stable fixed point in the forward pass. In contrast, the default gain for SELU sacrifices the normalisation effect for more stable gradient flow in rectangular layers.
Parameters:
- nonlinearity – the non-linear function (nn.functional name)
- param – optional parameter for the non-linear function
Examples
gain = nn.init.calculate_gain('leaky_relu', 0.2) # leaky_relu with negative_slope=0.2
torch.nn.init.uniform_(tensor, a=0.0, b=1.0)[source]¶
Fills the input Tensor with values drawn from the uniform distribution U(a,b)\mathcal{U}(a, b).
Parameters:
- tensor (Tensor) – an n-dimensional torch.Tensor
- a (float) – the lower bound of the uniform distribution
- b (float) – the upper bound of the uniform distribution
Return type:
Examples
w = torch.empty(3, 5) nn.init.uniform_(w)
torch.nn.init.normal_(tensor, mean=0.0, std=1.0)[source]¶
Fills the input Tensor with values drawn from the normal distribution N(mean,std2)\mathcal{N}(\text{mean}, \text{std}^2).
Parameters:
- tensor (Tensor) – an n-dimensional torch.Tensor
- mean (float) – the mean of the normal distribution
- std (float) – the standard deviation of the normal distribution
Return type:
Examples
w = torch.empty(3, 5) nn.init.normal_(w)
torch.nn.init.constant_(tensor, val)[source]¶
Fills the input Tensor with the value val\text{val}.
Parameters:
Return type:
Examples
w = torch.empty(3, 5) nn.init.constant_(w, 0.3)
torch.nn.init.ones_(tensor)[source]¶
Fills the input Tensor with the scalar value 1.
Parameters:
tensor (Tensor) – an n-dimensional torch.Tensor
Return type:
Examples
w = torch.empty(3, 5) nn.init.ones_(w)
torch.nn.init.zeros_(tensor)[source]¶
Fills the input Tensor with the scalar value 0.
Parameters:
tensor (Tensor) – an n-dimensional torch.Tensor
Return type:
Examples
w = torch.empty(3, 5) nn.init.zeros_(w)
torch.nn.init.eye_(tensor)[source]¶
Fills the 2-dimensional input Tensor with the identity matrix. Preserves the identity of the inputs in Linear layers, where as many inputs are preserved as possible.
Parameters:
tensor – a 2-dimensional torch.Tensor
Examples
w = torch.empty(3, 5) nn.init.eye_(w)
torch.nn.init.dirac_(tensor, groups=1)[source]¶
Fills the {3, 4, 5}-dimensional input Tensor with the Dirac delta function. Preserves the identity of the inputs in Convolutionallayers, where as many input channels are preserved as possible. In case of groups>1, each group of channels preserves identity
Parameters:
- tensor – a {3, 4, 5}-dimensional torch.Tensor
- groups (int, optional) – number of groups in the conv layer (default: 1)
Examples
w = torch.empty(3, 16, 5, 5) nn.init.dirac_(w) w = torch.empty(3, 24, 5, 5) nn.init.dirac_(w, 3)
torch.nn.init.xavier_uniform_(tensor, gain=1.0)[source]¶
Fills the input Tensor with values according to the method described in Understanding the difficulty of training deep feedforward neural networks - Glorot, X. & Bengio, Y. (2010), using a uniform distribution. The resulting tensor will have values sampled fromU(−a,a)\mathcal{U}(-a, a) where
a=gain×6fan_in+fan_outa = \text{gain} \times \sqrt{\frac{6}{\text{fan\_in} + \text{fan\_out}}}
Also known as Glorot initialization.
Parameters:
Return type:
Examples
w = torch.empty(3, 5) nn.init.xavier_uniform_(w, gain=nn.init.calculate_gain('relu'))
torch.nn.init.xavier_normal_(tensor, gain=1.0)[source]¶
Fills the input Tensor with values according to the method described in Understanding the difficulty of training deep feedforward neural networks - Glorot, X. & Bengio, Y. (2010), using a normal distribution. The resulting tensor will have values sampled fromN(0,std2)\mathcal{N}(0, \text{std}^2) where
std=gain×2fan_in+fan_out\text{std} = \text{gain} \times \sqrt{\frac{2}{\text{fan\_in} + \text{fan\_out}}}
Also known as Glorot initialization.
Parameters:
Return type:
Examples
w = torch.empty(3, 5) nn.init.xavier_normal_(w)
torch.nn.init.kaiming_uniform_(tensor, a=0, mode='fan_in', nonlinearity='leaky_relu')[source]¶
Fills the input Tensor with values according to the method described in Delving deep into rectifiers: Surpassing human-level performance on ImageNet classification - He, K. et al. (2015), using a uniform distribution. The resulting tensor will have values sampled fromU(−bound,bound)\mathcal{U}(-\text{bound}, \text{bound}) where
bound=gain×3fan_mode\text{bound} = \text{gain} \times \sqrt{\frac{3}{\text{fan\_mode}}}
Also known as He initialization.
Parameters:
- tensor (Tensor) – an n-dimensional torch.Tensor
- a (float) – the negative slope of the rectifier used after this layer (only used with
'leaky_relu') - mode (str) – either
'fan_in'(default) or'fan_out'. Choosing'fan_in'preserves the magnitude of the variance of the weights in the forward pass. Choosing'fan_out'preserves the magnitudes in the backwards pass. - nonlinearity (str) – the non-linear function (nn.functional name), recommended to use only with
'relu'or'leaky_relu'(default).
Examples
w = torch.empty(3, 5) nn.init.kaiming_uniform_(w, mode='fan_in', nonlinearity='relu')
torch.nn.init.kaiming_normal_(tensor, a=0, mode='fan_in', nonlinearity='leaky_relu')[source]¶
Fills the input Tensor with values according to the method described in Delving deep into rectifiers: Surpassing human-level performance on ImageNet classification - He, K. et al. (2015), using a normal distribution. The resulting tensor will have values sampled fromN(0,std2)\mathcal{N}(0, \text{std}^2) where
std=gainfan_mode\text{std} = \frac{\text{gain}}{\sqrt{\text{fan\_mode}}}
Also known as He initialization.
Parameters:
- tensor (Tensor) – an n-dimensional torch.Tensor
- a (float) – the negative slope of the rectifier used after this layer (only used with
'leaky_relu') - mode (str) – either
'fan_in'(default) or'fan_out'. Choosing'fan_in'preserves the magnitude of the variance of the weights in the forward pass. Choosing'fan_out'preserves the magnitudes in the backwards pass. - nonlinearity (str) – the non-linear function (nn.functional name), recommended to use only with
'relu'or'leaky_relu'(default).
Examples
w = torch.empty(3, 5) nn.init.kaiming_normal_(w, mode='fan_out', nonlinearity='relu')
torch.nn.init.trunc_normal_(tensor, mean=0.0, std=1.0, a=- 2.0, b=2.0)[source]¶
Fills the input Tensor with values drawn from a truncated normal distribution. The values are effectively drawn from the normal distribution N(mean,std2)\mathcal{N}(\text{mean}, \text{std}^2)with values outside [a,b][a, b] redrawn until they are within the bounds. The method used for generating the random values works best when a≤mean≤ba \leq \text{mean} \leq b.
Parameters:
- tensor (Tensor) – an n-dimensional torch.Tensor
- mean (float) – the mean of the normal distribution
- std (float) – the standard deviation of the normal distribution
- a (float) – the minimum cutoff value
- b (float) – the maximum cutoff value
Return type:
Examples
w = torch.empty(3, 5) nn.init.trunc_normal_(w)
torch.nn.init.orthogonal_(tensor, gain=1)[source]¶
Fills the input Tensor with a (semi) orthogonal matrix, as described in Exact solutions to the nonlinear dynamics of learning in deep linear neural networks - Saxe, A. et al. (2013). The input tensor must have at least 2 dimensions, and for tensors with more than 2 dimensions the trailing dimensions are flattened.
Parameters:
- tensor – an n-dimensional torch.Tensor, where n≥2n \geq 2
- gain – optional scaling factor
Examples
w = torch.empty(3, 5) nn.init.orthogonal_(w)
torch.nn.init.sparse_(tensor, sparsity, std=0.01)[source]¶
Fills the 2D input Tensor as a sparse matrix, where the non-zero elements will be drawn from the normal distributionN(0,0.01)\mathcal{N}(0, 0.01), as described in Deep learning via Hessian-free optimization - Martens, J. (2010).
Parameters:
- tensor – an n-dimensional torch.Tensor
- sparsity – The fraction of elements in each column to be set to zero
- std – the standard deviation of the normal distribution used to generate the non-zero values
Examples
w = torch.empty(3, 5) nn.init.sparse_(w, sparsity=0.1)