SmoothL1Loss — PyTorch 2.7 documentation (original) (raw)

class torch.nn.SmoothL1Loss(size_average=None, reduce=None, reduction='mean', beta=1.0)[source][source]

Creates a criterion that uses a squared term if the absolute element-wise error falls below beta and an L1 term otherwise. It is less sensitive to outliers than torch.nn.MSELoss and in some cases prevents exploding gradients (e.g. see the paper Fast R-CNN by Ross Girshick).

For a batch of size NN, the unreduced loss can be described as:

ℓ(x,y)=L={l1,...,lN}T\ell(x, y) = L = \{l_1, ..., l_N\}^T

with

ln={0.5(xn−yn)2/beta,if ∣xn−yn∣<beta∣xn−yn∣−0.5∗beta,otherwise l_n = \begin{cases} 0.5 (x_n - y_n)^2 / beta, & \text{if } |x_n - y_n| < beta \\ |x_n - y_n| - 0.5 * beta, & \text{otherwise } \end{cases}

If reduction is not none, then:

ℓ(x,y)={mean⁡(L),if reduction=‘mean’;sum⁡(L),if reduction=‘sum’.\ell(x, y) = \begin{cases} \operatorname{mean}(L), & \text{if reduction} = \text{`mean';}\\ \operatorname{sum}(L), & \text{if reduction} = \text{`sum'.} \end{cases}

Note

Smooth L1 loss can be seen as exactly L1Loss, but with the ∣x−y∣<beta|x - y| < betaportion replaced with a quadratic function such that its slope is 1 at ∣x−y∣=beta|x - y| = beta. The quadratic segment smooths the L1 loss near ∣x−y∣=0|x - y| = 0.

Note

Smooth L1 loss is closely related to HuberLoss, being equivalent to huber(x,y)/betahuber(x, y) / beta (note that Smooth L1’s beta hyper-parameter is also known as delta for Huber). This leads to the following differences:

Parameters

Shape: