HuberLoss — PyTorch 2.7 documentation (original) (raw)

class torch.nn.HuberLoss(reduction='mean', delta=1.0)[source][source]¶

Creates a criterion that uses a squared term if the absolute element-wise error falls below delta and a delta-scaled L1 term otherwise. This loss combines advantages of both L1Loss and MSELoss; the delta-scaled L1 region makes the loss less sensitive to outliers than MSELoss, while the L2 region provides smoothness over L1Loss near 0. SeeHuber loss for more information.

For a batch of size NN, the unreduced loss can be described as:

ℓ(x,y)=L={l1,...,lN}T\ell(x, y) = L = \{l_1, ..., l_N\}^T

with

ln={0.5(xn−yn)2,if ∣xn−yn∣<deltadelta∗(∣xn−yn∣−0.5∗delta),otherwise l_n = \begin{cases} 0.5 (x_n - y_n)^2, & \text{if } |x_n - y_n| < delta \\ delta * (|x_n - y_n| - 0.5 * delta), & \text{otherwise } \end{cases}

If reduction is not none, then:

ℓ(x,y)={mean⁡(L),if reduction=‘mean’;sum⁡(L),if reduction=‘sum’.\ell(x, y) = \begin{cases} \operatorname{mean}(L), & \text{if reduction} = \text{`mean';}\\ \operatorname{sum}(L), & \text{if reduction} = \text{`sum'.} \end{cases}

Note

When delta is set to 1, this loss is equivalent to SmoothL1Loss. In general, this loss differs from SmoothL1Loss by a factor of delta (AKA beta in Smooth L1). See SmoothL1Loss for additional discussion on the differences in behavior between the two losses.

Parameters

reduction (str, optional) – Specifies the reduction to apply to the output:'none' | 'mean' | 'sum'. 'none': no reduction will be applied,'mean': the sum of the output will be divided by the number of elements in the output, 'sum': the output will be summed. Default: 'mean'
delta (float, optional) – Specifies the threshold at which to change between delta-scaled L1 and L2 loss. The value must be positive. Default: 1.0

Shape:

Input: (∗)(*) where ∗* means any number of dimensions.
Target: (∗)(*), same shape as the input.
Output: scalar. If reduction is 'none', then (∗)(*), same shape as the input.