BCELoss — PyTorch 2.7 documentation (original) (raw)

class torch.nn.BCELoss(weight=None, size_average=None, reduce=None, reduction='mean')[source][source]

Creates a criterion that measures the Binary Cross Entropy between the target and the input probabilities:

The unreduced (i.e. with reduction set to 'none') loss can be described as:

ℓ(x,y)=L={l1,…,lN}⊤,ln=−wn[yn⋅log⁡xn+(1−yn)⋅log⁡(1−xn)],\ell(x, y) = L = \{l_1,\dots,l_N\}^\top, \quad l_n = - w_n \left[ y_n \cdot \log x_n + (1 - y_n) \cdot \log (1 - x_n) \right],

where NN is the batch size. If reduction is not 'none'(default 'mean'), then

ℓ(x,y)={mean⁡(L),if reduction=‘mean’;sum⁡(L),if reduction=‘sum’.\ell(x, y) = \begin{cases} \operatorname{mean}(L), & \text{if reduction} = \text{`mean';}\\ \operatorname{sum}(L), & \text{if reduction} = \text{`sum'.} \end{cases}

This is used for measuring the error of a reconstruction in for example an auto-encoder. Note that the targets yy should be numbers between 0 and 1.

Notice that if xnx_n is either 0 or 1, one of the log terms would be mathematically undefined in the above loss equation. PyTorch chooses to setlog⁡(0)=−∞\log (0) = -\infty, since lim⁡x→0log⁡(x)=−∞\lim_{x\to 0} \log (x) = -\infty. However, an infinite term in the loss equation is not desirable for several reasons.

For one, if either yn=0y_n = 0 or (1−yn)=0(1 - y_n) = 0, then we would be multiplying 0 with infinity. Secondly, if we have an infinite loss value, then we would also have an infinite term in our gradient, sincelim⁡x→0ddxlog⁡(x)=∞\lim_{x\to 0} \frac{d}{dx} \log (x) = \infty. This would make BCELoss’s backward method nonlinear with respect to xnx_n, and using it for things like linear regression would not be straight-forward.

Our solution is that BCELoss clamps its log function outputs to be greater than or equal to -100. This way, we can always have a finite loss value and a linear backward method.

Parameters

Shape:

Examples:

m = nn.Sigmoid() loss = nn.BCELoss() input = torch.randn(3, 2, requires_grad=True) target = torch.rand(3, 2, requires_grad=False) output = loss(m(input), target) output.backward()