CTCLoss — PyTorch 2.7 documentation (original) (raw)

class torch.nn.CTCLoss(blank=0, reduction='mean', zero_infinity=False)[source][source]

The Connectionist Temporal Classification loss.

Calculates loss between a continuous (unsegmented) time series and a target sequence. CTCLoss sums over the probability of possible alignments of input to target, producing a loss value which is differentiable with respect to each input node. The alignment of input to target is assumed to be “many-to-one”, which limits the length of the target sequence such that it must be ≤\leq the input length.

Parameters

Shape:

Examples:

Target are to be padded

T = 50 # Input sequence length C = 20 # Number of classes (including blank) N = 16 # Batch size S = 30 # Target sequence length of longest target in batch (padding length) S_min = 10 # Minimum target length, for demonstration purposes

Initialize random batch of input vectors, for *size = (T,N,C)

input = torch.randn(T, N, C).log_softmax(2).detach().requires_grad_()

Initialize random batch of targets (0 = blank, 1:C = classes)

target = torch.randint(low=1, high=C, size=(N, S), dtype=torch.long)

input_lengths = torch.full(size=(N,), fill_value=T, dtype=torch.long) target_lengths = torch.randint(low=S_min, high=S, size=(N,), dtype=torch.long) ctc_loss = nn.CTCLoss() loss = ctc_loss(input, target, input_lengths, target_lengths) loss.backward()

Target are to be un-padded

T = 50 # Input sequence length C = 20 # Number of classes (including blank) N = 16 # Batch size

Initialize random batch of input vectors, for *size = (T,N,C)

input = torch.randn(T, N, C).log_softmax(2).detach().requires_grad_() input_lengths = torch.full(size=(N,), fill_value=T, dtype=torch.long)

Initialize random batch of targets (0 = blank, 1:C = classes)

target_lengths = torch.randint(low=1, high=T, size=(N,), dtype=torch.long) target = torch.randint(low=1, high=C, size=(sum(target_lengths),), dtype=torch.long) ctc_loss = nn.CTCLoss() loss = ctc_loss(input, target, input_lengths, target_lengths) loss.backward()

Target are to be un-padded and unbatched (effectively N=1)

T = 50 # Input sequence length C = 20 # Number of classes (including blank)

Initialize random batch of input vectors, for *size = (T,C)

input = torch.randn(T, C).log_softmax(1).detach().requires_grad_() input_lengths = torch.tensor(T, dtype=torch.long)

Initialize random batch of targets (0 = blank, 1:C = classes)

target_lengths = torch.randint(low=1, high=T, size=(), dtype=torch.long) target = torch.randint(low=1, high=C, size=(target_lengths,), dtype=torch.long) ctc_loss = nn.CTCLoss() loss = ctc_loss(input, target, input_lengths, target_lengths) loss.backward()

Reference:

A. Graves et al.: Connectionist Temporal Classification: Labelling Unsegmented Sequence Data with Recurrent Neural Networks:https://www.cs.toronto.edu/~graves/icml_2006.pdf

Note

In order to use CuDNN, the following must be satisfied: targets must be in concatenated format, all input_lengths must be T. blank=0blank=0,target_lengths ≤256\leq 256, the integer arguments must be of dtype torch.int32.

The regular implementation uses the (more common in PyTorch) torch.long dtype.

Note

In some circumstances when using the CUDA backend with CuDNN, this operator may select a nondeterministic algorithm to increase performance. If this is undesirable, you can try to make the operation deterministic (potentially at a performance cost) by setting torch.backends.cudnn.deterministic = True. Please see the notes on Reproducibility for background.