BCEWithLogitsLoss — PyTorch 2.7 documentation (original) (raw)

class torch.nn.BCEWithLogitsLoss(weight=None, size_average=None, reduce=None, reduction='mean', pos_weight=None)[source][source]

This loss combines a Sigmoid layer and the BCELoss in one single class. This version is more numerically stable than using a plain Sigmoidfollowed by a BCELoss as, by combining the operations into one layer, we take advantage of the log-sum-exp trick for numerical stability.

The unreduced (i.e. with reduction set to 'none') loss can be described as:

ℓ(x,y)=L={l1,…,lN}⊤,ln=−wn[yn⋅log⁡σ(xn)+(1−yn)⋅log⁡(1−σ(xn))],\ell(x, y) = L = \{l_1,\dots,l_N\}^\top, \quad l_n = - w_n \left[ y_n \cdot \log \sigma(x_n) + (1 - y_n) \cdot \log (1 - \sigma(x_n)) \right],

where NN is the batch size. If reduction is not 'none'(default 'mean'), then

ℓ(x,y)={mean⁡(L),if reduction=‘mean’;sum⁡(L),if reduction=‘sum’.\ell(x, y) = \begin{cases} \operatorname{mean}(L), & \text{if reduction} = \text{`mean';}\\ \operatorname{sum}(L), & \text{if reduction} = \text{`sum'.} \end{cases}

This is used for measuring the error of a reconstruction in for example an auto-encoder. Note that the targets t[i] should be numbers between 0 and 1.

It’s possible to trade off recall and precision by adding weights to positive examples. In the case of multi-label classification the loss can be described as:

ℓc(x,y)=Lc={l1,c,…,lN,c}⊤,ln,c=−wn,c[pcyn,c⋅log⁡σ(xn,c)+(1−yn,c)⋅log⁡(1−σ(xn,c))],\ell_c(x, y) = L_c = \{l_{1,c},\dots,l_{N,c}\}^\top, \quad l_{n,c} = - w_{n,c} \left[ p_c y_{n,c} \cdot \log \sigma(x_{n,c}) + (1 - y_{n,c}) \cdot \log (1 - \sigma(x_{n,c})) \right],

where cc is the class number (c>1c > 1 for multi-label binary classification,c=1c = 1 for single-label binary classification),nn is the number of the sample in the batch andpcp_c is the weight of the positive answer for the class cc.

pc>1p_c > 1 increases the recall, pc<1p_c < 1 increases the precision.

For example, if a dataset contains 100 positive and 300 negative examples of a single class, then pos_weight for the class should be equal to 300100=3\frac{300}{100}=3. The loss would act as if the dataset contains 3×100=3003\times 100=300 positive examples.

Examples:

target = torch.ones([10, 64], dtype=torch.float32) # 64 classes, batch size = 10 output = torch.full([10, 64], 1.5) # A prediction (logit) pos_weight = torch.ones([64]) # All weights are equal to 1 criterion = torch.nn.BCEWithLogitsLoss(pos_weight=pos_weight) criterion(output, target) # -log(sigmoid(1.5)) tensor(0.20...)

In the above example, the pos_weight tensor’s elements correspond to the 64 distinct classes in a multi-label binary classification scenario. Each element in pos_weight is designed to adjust the loss function based on the imbalance between negative and positive samples for the respective class. This approach is useful in datasets with varying levels of class imbalance, ensuring that the loss calculation accurately accounts for the distribution in each class.

Parameters

Shape:

Examples:

loss = nn.BCEWithLogitsLoss() input = torch.randn(3, requires_grad=True) target = torch.empty(3).random_(2) output = loss(input, target) output.backward()