torch.autograd.backward — PyTorch 2.7 documentation (original) (raw)

torch.autograd.backward(tensors, grad_tensors=None, retain_graph=None, create_graph=False, grad_variables=None, inputs=None)[source][source]

Compute the sum of gradients of given tensors with respect to graph leaves.

The graph is differentiated using the chain rule. If any of tensorsare non-scalar (i.e. their data has more than one element) and require gradient, then the Jacobian-vector product would be computed, in this case the function additionally requires specifying grad_tensors. It should be a sequence of matching length, that contains the “vector” in the Jacobian-vector product, usually the gradient of the differentiated function w.r.t. corresponding tensors (None is an acceptable value for all tensors that don’t need gradient tensors).

This function accumulates gradients in the leaves - you might need to zero.grad attributes or set them to None before calling it. See Default gradient layoutsfor details on the memory layout of accumulated gradients.

Note

Using this method with create_graph=True will create a reference cycle between the parameter and its gradient which can cause a memory leak. We recommend using autograd.grad when creating the graph to avoid this. If you have to use this function, make sure to reset the .grad fields of your parameters to None after use to break the cycle and avoid the leak.

Parameters