ctc - Connectionist temporal classification (CTC) loss for unaligned sequence
classification - MATLAB ([original](http://www.mathworks.com/access/helpdesk/help/deeplearning/ref/dlarray.ctc.html)) ([raw](?raw))
Connectionist temporal classification (CTC) loss for unaligned sequence classification
Since R2021a
Syntax
Description
The CTC operation computes the connectionist temporal classification (CTC) loss between unaligned sequences.
The ctc
function computes the CTC loss between predictions and targets represented as dlarray data.Using dlarray
objects makes working with high dimensional data easier by allowing you to label the dimensions. For example, you can label which dimensions correspond to spatial, time, channel, and batch dimensions using the"S"
, "T"
, "C"
, and"B"
labels, respectively. For unspecified and other dimensions, use the"U"
label. For dlarray
object functions that operate over particular dimensions, you can specify the dimension labels by formatting thedlarray
object directly, or by using the DataFormat
option.
[loss](#mw%5F03b82c90-d9dc-4ccb-b5e6-633b82d657ed) = ctc([Y](#mw%5F409cc90d-b966-4e5f-bf91-d43a7f695139),[targets](#mw%5Fe8bd61d8-64ff-44c2-b91a-84ec833878e4),[YMask](#mw%5Fa02837ad-574c-4bd0-a1af-93494df07958),[targetsMask](#mw%5F52b0c0cd-6829-45e9-949c-0584c4a4703e))
returns the CTC loss between the formatted dlarray
objectY
containing the predictions and the target valuestargets
using the prediction and target masksYMask
and targetsMask
, respectively. The function reduces the loss values by taking the mean across the batch dimension.
For unformatted input data, use the 'DataFormat' option.
[loss](#mw%5F03b82c90-d9dc-4ccb-b5e6-633b82d657ed) = ctc([Y](#mw%5F409cc90d-b966-4e5f-bf91-d43a7f695139),[targets](#mw%5Fe8bd61d8-64ff-44c2-b91a-84ec833878e4),[YMask](#mw%5Fa02837ad-574c-4bd0-a1af-93494df07958),[targetsMask](#mw%5F52b0c0cd-6829-45e9-949c-0584c4a4703e),'DataFormat',FMT)
also specifies the dimension format FMT
when Y
is not a formatted dlarray
.
[loss](#mw%5F03b82c90-d9dc-4ccb-b5e6-633b82d657ed) = ctc(___,[Name,Value](#namevaluepairarguments))
specifies options using one or more name-value pair arguments in addition to the input arguments in previous syntaxes. For example, 'BlankIndex','last'
specifies a blank index corresponding to the last element of the vocabulary.
Examples
CTC Loss for Unaligned Sequences
Create an array of 2 target sequences of different lengths over 10 classes. The target sequences must not contain the blank index which is 1 by default.
numObservations = 2; numClasses = 10;
targets = cell(numObservations,1); targets{1} = [2 3 5 7 9 2 3 5 3 2 3]; targets{2} = [2 3 3 3 4 4 4 6 8 8 8 10 3];
Pad the targets using the padsequences
function. The targets must be positive integers between 1 and the number of classes, and must not contain the blank index, so specify a padding value of 2.
[targets,targetsMask] = padsequences(targets,2,'PaddingValue',2);
Create random arrays of prediction sequences. The length of the prediction sequences must be greater than or equal to the length plus the number of repeated indices of the corresponding target sequence. In this case, the first sequence has length 11 with no repeated indices, the second sequence has length 13 with 6 repeated indices.
Y = cell(numObservations,1);
Y{1} = rand(numClasses,11); Y{2} = rand(numClasses,13 + 6);
Pad the prediction sequences in the second dimension using the padsequences
function and also return the corresponding mask.
[Y,YMask] = padsequences(Y,2);
Convert the padded prediction sequences and mask to dlarray
with format "CTB"
(channel, time, batch). Because formatted dlarray
objects automatically sort the dimensions, keep the dimensions of the targets and mask consistent by also converting them to a formatted dlarray
objects with the same formats.
Y = dlarray(Y,"CTB"); YMask = dlarray(YMask,"CTB");
The ctc
function expects output from a softmax operation or layer. Apply the softmax operation to the predictions.
The ctc
function requires the targets and target mask specified as 2-D arrays, remove the singleton channel dimension using the squeeze
function.
targets = squeeze(targets); targetsMask = squeeze(targetsMask);
Similarly, convert the padded target sequences and mask to dlarray
with format "TB"
(time, batch).
targets = dlarray(targets,"TB"); targetsMask = dlarray(targetsMask,"TB");
Compute the CTC loss between the predictions and the targets using the ctc
function.
loss = ctc(Y,targets,YMask,targetsMask)
loss = 1x1 dlarray
35.5857
Input Arguments
Y
— Predictions
dlarray
| numeric array
Predictions, specified as a formatted dlarray
, an unformatteddlarray
, or a numeric array. When Y
is not a formatted dlarray
, you must specify the dimension format using the'DataFormat' option.
The function computes the CTC loss assuming that Y
is the output of a softmax operation or layer.
The predictions Y
must have a 'B'
(batch),'C'
(channel), and 'T'
(time) dimension and can have different sequence lengths to the corresponding targets intargets.
If Y
is a numeric array, then targets
,YMask, or targetsMask must be adlarray
.
targets
— Target sequences
dlarray
| numeric array
Target sequences, specified as a formatted or unformatted dlarray
or a numeric array.
Specify the targets as an array with dimensions corresponding to the observations and the time steps of the target sequences. For example, specify the targets as a formatted dlarray
object with format 'BT'
(batch, time).
The targets must have the same number of observations as the predictions. The target values corresponding to mask values equal to 1 must be positive integers between 1 and the number of channels of Y and must not include the blank index.
If targets
is a formatted dlarray
, then its format must be the same as the format of Y
, or the same asDataFormat if Y
is unformatted.
If targets
is an unformatted dlarray
or a numeric array, then the function applies the format of Y
or the value ofDataFormat
to targets
.
Tip
Formatted dlarray
objects automatically permute the dimensions of the underlying data to have the order "S"
(spatial), "C"
(channel), "B"
(batch), "T"
(time), then"U"
(unspecified). To ensure that the dimensions ofY
and targets
are consistent, whenY
is a formatted dlarray
, also specifytargets
as a formatted dlarray
.
YMask
— Mask indicating which prediction elements to include for loss computation
dlarray
| logical array | numeric array
Mask indicating which prediction elements to include for loss computation, specified as a dlarray
object, a logical array, or a numeric array with the same size as Y.
The function includes and excludes elements of the predictions for loss computation when the corresponding value in the mask is 1 and 0, respectively.
For each time-step and observation in the mask, the corresponding elements in channel dimension must be all ones or all zeros.
Tip
Formatted dlarray
objects automatically permute the dimensions of the underlying data to have this order: "S"
(spatial), "C"
(channel), "B"
(batch), "T"
(time), and"U"
(unspecified). For example, dlarray
objects automatically permute the dimensions of data with format "TSCSBS"
to have format "SSSCBT"
.
To ensure that the dimensions of Y
and the mask are consistent, whenY
is a formatted dlarray
, also specify the mask as a formatted dlarray
.
targetsMask
— Mask indicating which target elements to include for loss computation
dlarray
| logical array | numeric array
Mask indicating which target elements to include for loss computation, specified as a dlarray
object, a logical array, or a numeric array with the same size as targets.
The function includes and excludes elements of the targets for loss computation when the corresponding value in the mask is 1 and 0, respectively.
Tip
Formatted dlarray
objects automatically permute the dimensions of the underlying data to have this order: "S"
(spatial), "C"
(channel), "B"
(batch), "T"
(time), and"U"
(unspecified). For example, dlarray
objects automatically permute the dimensions of data with format "TSCSBS"
to have format "SSSCBT"
.
To ensure that the dimensions of Y and the mask are consistent, whenY
is a formatted dlarray
, also specify the mask as a formatted dlarray
.
Name-Value Arguments
Specify optional pairs of arguments asName1=Value1,...,NameN=ValueN
, where Name
is the argument name and Value
is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter.
Before R2021a, use commas to separate each name and value, and enclose Name
in quotes.
Example: 'BlankIndex','last'
specifies a blank index corresponding to the last element of the vocabulary
BlankIndex
— Index of blank character
1 (default) | positive integer | 'last'
Index of blank character, specified as the comma-separated pair consisting of'BlankIndex'
and one of the following:
- Positive integer – Use the element in the vocabulary with the specified index as the blank character. If
'BlankIndex'
is an integer, then it must between 1 and the number of channels of Y inclusive. 'last'
– Use the last element of the vocabulary as the blank character.
Data Types: single
| double
| int8
| int16
| int32
| int64
| uint8
| uint16
| uint32
| uint64
| char
| string
DataFormat
— Description of data dimensions
character vector | string scalar
Description of the data dimensions, specified as a character vector or string scalar.
A data format is a string of characters, where each character describes the type of the corresponding data dimension.
The characters are:
"S"
— Spatial"C"
— Channel"B"
— Batch"T"
— Time"U"
— Unspecified
For example, consider an array containing a batch of sequences where the first, second, and third dimensions correspond to channels, observations, and time steps, respectively. You can specify that this array has the format "CBT"
(channel, batch, time).
You can specify multiple dimensions labeled "S"
or "U"
. You can use the labels "C"
, "B"
, and"T"
once each, at most. The software ignores singleton trailing"U"
dimensions after the second dimension.
If the input data is not a formatted dlarray
object, then you must specify the DataFormat
option.
For more information, see Deep Learning Data Formats.
Data Types: char
| string
Output Arguments
loss
— CTC loss
dlarray
CTC loss, returned as an unformatted dlarray
scalar with the same underlying data type as the input Y
.
Algorithms
Deep Learning Array Formats
Most deep learning networks and functions operate on different dimensions of the input data in different ways.
For example, an LSTM operation iterates over the time dimension of the input data, and a batch normalization operation normalizes over the batch dimension of the input data.
To provide input data with labeled dimensions or input data with additional layout information, you can use data formats.
A data format is a string of characters, where each character describes the type of the corresponding data dimension.
The characters are:
"S"
— Spatial"C"
— Channel"B"
— Batch"T"
— Time"U"
— Unspecified
For example, consider an array containing a batch of sequences where the first, second, and third dimensions correspond to channels, observations, and time steps, respectively. You can specify that this array has the format "CBT"
(channel, batch, time).
To create formatted input data, create a dlarray object and specify the format using the second argument.
To provide additional layout information with unformatted data, specify the format using theDataFormat argument.
For more information, see Deep Learning Data Formats.
Extended Capabilities
GPU Arrays
Accelerate code by running on a graphics processing unit (GPU) using Parallel Computing Toolbox™.
The ctc
function supports GPU array input with these usage notes and limitations:
- When at least one of the following input arguments is a
gpuArray
or adlarray
with underlying data of typegpuArray
, this function runs on the GPU:Y
targets
YMask
targetsMask
For more information, see Run MATLAB Functions on a GPU (Parallel Computing Toolbox).
Version History
Introduced in R2021a