crossentropy - Cross-entropy loss for classification tasks - MATLAB (original) (raw)
Cross-entropy loss for classification tasks
Syntax
Description
The cross-entropy operation computes the cross-entropy loss between network predictions and binary or one-hot encoded targets for single-label and multi-label classification tasks.
The crossentropy
function computes the cross-entropy loss between predictions and targets represented as dlarray data.Using dlarray
objects makes working with high dimensional data easier by allowing you to label the dimensions. For example, you can label which dimensions correspond to spatial, time, channel, and batch dimensions using the"S"
, "T"
, "C"
, and"B"
labels, respectively. For unspecified and other dimensions, use the"U"
label. For dlarray
object functions that operate over particular dimensions, you can specify the dimension labels by formatting thedlarray
object directly, or by using the DataFormat
option.
Note
To train with cross-entropy loss using the trainnet function, set the loss function to "crossentropy"
.
[loss](#mw%5F30c60e31-16cf-4542-9563-323e65ff46be) = crossentropy([Y](#mw%5Ff6eaac08-f9fa-42ae-8714-3cec2237e548%5Fsep%5Fmw%5F72d38514-0d45-4bcf-9259-997edf0cb0c8),[targets](#mw%5Fe2dfba3c-6657-4405-a756-07dea196f477))
returns the categorical cross-entropy loss between the formatted dlarray
object Y
containing the predictions and the target valuestargets
for single-label classification tasks. The outputloss
is an unformatted dlarray
scalar.
For unformatted input data, use the DataFormat argument.
[loss](#mw%5F30c60e31-16cf-4542-9563-323e65ff46be) = crossentropy([Y](#mw%5Ff6eaac08-f9fa-42ae-8714-3cec2237e548%5Fsep%5Fmw%5F72d38514-0d45-4bcf-9259-997edf0cb0c8),[targets](#mw%5Fe2dfba3c-6657-4405-a756-07dea196f477),[weights](#mw%5Ff6eaac08-f9fa-42ae-8714-3cec2237e548%5Fsep%5Fmw%5F3097b386-da90-4b07-8553-d1b4c0b68734))
applies weights to the calculated loss values. Use this syntax to weight the contributions of classes, observations, regions, or individual elements of the input to the calculated loss values.
[loss](#mw%5F30c60e31-16cf-4542-9563-323e65ff46be) = crossentropy(___,[Name=Value](#namevaluepairarguments))
specifies options using one or more name-value pair arguments in addition to the input arguments in previous syntaxes. For example,ClassificationMode="multilabel"
computes the cross-entropy loss for a multi-label classification task.
Examples
Cross-Entropy Loss for Single-Label Classification
Create an array of prediction scores for 12 observations over 10 classes.
numClasses = 10; numObservations = 12;
Y = rand(numClasses,numObservations); Y = dlarray(Y,"CB"); Y = softmax(Y);
View the size and format of the prediction scores.
Create an array of targets encoded as one-hot vectors.
labels = randi(numClasses,[1 numObservations]); targets = onehotencode(labels,1,ClassNames=1:numClasses);
View the size of the targets.
Compute the cross-entropy loss between the predictions and the targets.
loss = crossentropy(Y,targets)
loss = 1x1 dlarray
2.3343
Cross-Entropy Loss for Multi-Label Classification
Create an array of prediction scores for 12 observations over 10 classes.
numClasses = 10; numObservations = 12; Y = rand(numClasses,numObservations); Y = dlarray(Y,"CB");
View the size and format of the prediction scores.
Create a random array of targets encoded as a numeric array of zeros and ones. Each observation can have multiple classes.
targets = rand(numClasses,numObservations) > 0.75; targets = single(targets);
View the size of the targets.
Compute the cross-entropy loss between the predictions and the targets. To specify cross-entropy loss for multi-label classification, set the ClassificationMode
argument to "multilabel"
.
loss = crossentropy(Y,targets,ClassificationMode="multilabel")
loss = 1x1 single dlarray
9.8853
Weighted Cross-Entropy Loss
Create an array of prediction scores for 12 observations over 10 classes.
numClasses = 10; numObservations = 12;
Y = rand(numClasses,numObservations); Y = dlarray(Y,"CB"); Y = softmax(Y);
View the size and format of the prediction scores.
Create an array of targets encoded as one-hot vectors.
labels = randi(numClasses,[1 numObservations]); targets = onehotencode(labels,1,ClassNames=1:numClasses);
View the size of the targets.
Compute the weighted cross-entropy loss between the predictions and the targets using a vector class weights. Specify a weights format of "UC"
(unspecified, channel) using the WeightsFormat
argument.
weights = rand(1,numClasses); loss = crossentropy(Y,targets,weights,WeightsFormat="UC")
loss = 1x1 dlarray
1.1261
Input Arguments
Y
— Predictions
dlarray
object | numeric array
Predictions, specified as a formatted or unformatted dlarray
object, or a numeric array. When Y
is not a formatteddlarray
, you must specify the dimension format using theDataFormat argument.
If Y
is a numeric array, targets must be adlarray
object.
targets
— Target classification labels
dlarray
| numeric array
Target classification labels, specified as a formatted or unformatteddlarray
or a numeric array.
Specify the targets as an array containing one-hot encoded labels with the same size and format as Y. For example, if Y
is anumObservations
-by-numClasses
array, thentargets(n,i)
= 1 if observation n
belongs to class i
targets(n,i)
= 0 otherwise.
If targets
is a formatted dlarray
, then its format must be the same as the format of Y
, or the same asDataFormat if Y
is unformatted.
If targets
is an unformatted dlarray
or a numeric array, then the function applies the format of Y
or the value ofDataFormat
to targets
.
Tip
Formatted dlarray
objects automatically permute the dimensions of the underlying data to have the order "S"
(spatial), "C"
(channel), "B"
(batch), "T"
(time), then"U"
(unspecified). To ensure that the dimensions ofY
and targets
are consistent, whenY
is a formatted dlarray
, also specifytargets
as a formatted dlarray
.
weights
— Weights
dlarray
object | numeric array
Weights, specified as a dlarray
object or a numeric array.
To specify class weights, specify a vector with a "C"
(channel) dimension with size matching the "C"
(channel) dimension ofY and a singleton "U"
(unspecified) dimension. Specify the dimensions of the class weights by using a formatteddlarray
object or by using the WeightsFormat argument.
To specify observation weights, specify a vector with a "B"
(batch) dimension with size matching the "B"
(batch) dimension ofY
. Specify the "B"
(batch) dimension of the class weights by using a formatted dlarray
object or by using theWeightsFormat
argument.
To specify weights for each element of the input independently, specify the weights as an array of the same size as Y
. In this case, ifweights
is not a formatted dlarray
object, then the function uses the same format as Y
. Alternatively, specify the weights format using the WeightsFormat
argument.
Name-Value Arguments
Specify optional pairs of arguments asName1=Value1,...,NameN=ValueN
, where Name
is the argument name and Value
is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter.
Before R2021a, use commas to separate each name and value, and enclose Name
in quotes.
Example: ClassificationMode="multilabel",DataFormat="CB"
evaluates the cross-entropy loss for multi-label classification tasks and specifies the dimension order of the input data as "CB"
ClassificationMode
— Type of classification task
"single-label"
(default) | "multilabel"
Type of classification task, specified as one of these values:
"single-label"
— Each observation is exclusively assigned one class label (single-label classification). The function computes the loss between the target value for the single category specified bytargets
and the corresponding prediction inY
, averaged over the number of observations."multilabel"
— Each observation can be assigned more than one independent class label (multilabel classification). The function computes the sum of the loss between each category specified bytargets
and the predictions inY
for those categories, averaged over the number of observations. Cross-entropy loss for this type of classification task is also known as binary cross-entropy loss.
Note
To select the classification mode for binary classification, you must consider the final layer of the network:
- If the final layer has an output size of one, such as with a sigmoid layer, use
"multilabel"
. - If the final layer has an output size of two, such as with a softmax layer, use
"single-label"
.
Mask
— Mask indicating which elements to include for loss computation
dlarray
| logical array | numeric array
Mask indicating which elements to include for loss computation, specified as adlarray
object, a logical array, or a numeric array with the same size as Y.
The function includes and excludes elements of the input data for loss computation when the corresponding value in the mask is 1 and 0, respectively.
If Mask
is a formatted dlarray
object, then its format must match that of Y
. If Mask
is not a formatted dlarray
object, then the function uses the same format asY
.
If you specify the DataFormat argument, then the function also uses the specified format for the mask.
The size of each dimension of Mask
must match the size of the corresponding dimension in Y
. The default value is a logical array of ones.
Tip
Formatted dlarray
objects automatically permute the dimensions of the underlying data to have this order: "S"
(spatial), "C"
(channel), "B"
(batch), "T"
(time), and"U"
(unspecified). For example, dlarray
objects automatically permute the dimensions of data with format "TSCSBS"
to have format "SSSCBT"
.
To ensure that the dimensions of Y
and the mask are consistent, whenY
is a formatted dlarray
, also specify the mask as a formatted dlarray
.
Reduction
— Loss value array reduction mode
"sum"
(default) | "none"
Loss value array reduction mode, specified as "sum"
or"none"
.
If the Reduction
argument is "sum"
, then the function sums all elements in the array of loss values. In this case, the outputloss is a scalar.
If the Reduction
argument is "none"
, then the function does not reduce the array of loss values. In this case, the outputloss
is an unformatted dlarray
object of the same size as Y.
NormalizationFactor
— Divisor for normalizing reduced loss
"batch-size"
(default) | "all-elements"
| "mask-included"
| "none"
Divisor for normalizing the reduced loss when Reduction is"sum"
, specified as one of the following:
"batch-size"
— Normalize the loss by dividing it by the number of observations in Y."all-elements"
— Normalize the loss by dividing it by the number of elements ofY
."mask-included"
— Normalize the loss by dividing the loss values by the product of the number of observations and the number of included elements specified by the mask for each observation independently. To use this option, you must specify a mask using theMask option."none"
— Do not normalize the loss.
DataFormat
— Description of data dimensions
character vector | string scalar
Description of the data dimensions, specified as a character vector or string scalar.
A data format is a string of characters, where each character describes the type of the corresponding data dimension.
The characters are:
"S"
— Spatial"C"
— Channel"B"
— Batch"T"
— Time"U"
— Unspecified
For example, consider an array containing a batch of sequences where the first, second, and third dimensions correspond to channels, observations, and time steps, respectively. You can specify that this array has the format "CBT"
(channel, batch, time).
You can specify multiple dimensions labeled "S"
or "U"
. You can use the labels "C"
, "B"
, and"T"
once each, at most. The software ignores singleton trailing"U"
dimensions after the second dimension.
If the input data is not a formatted dlarray
object, then you must specify the DataFormat
option.
For more information, see Deep Learning Data Formats.
Data Types: char
| string
WeightsFormat
— Description of dimensions of weights
character vector | string scalar
Description of the dimensions of the weights, specified as a character vector or string scalar.
A data format is a string of characters, where each character describes the type of the corresponding data dimension.
The characters are:
"S"
— Spatial"C"
— Channel"B"
— Batch"T"
— Time"U"
— Unspecified
For example, consider an array containing a batch of sequences where the first, second, and third dimensions correspond to channels, observations, and time steps, respectively. You can specify that this array has the format "CBT"
(channel, batch, time).
You can specify multiple dimensions labeled "S"
or "U"
. You can use the labels "C"
, "B"
, and"T"
once each, at most. The software ignores singleton trailing"U"
dimensions after the second dimension.
If weights is a numeric vector andY has two or more nonsingleton dimensions, then you must specify theWeightsFormat
option.
If weights
is not a vector, orweights
andY
are both vectors, then the default value of WeightsFormat
is the same as the format of Y
.
For more information, see Deep Learning Data Formats.
Data Types: char
| string
Output Arguments
loss
— Cross-entropy loss
dlarray
Cross-entropy loss, returned as an unformatted dlarray
. The output loss
is an unformatted dlarray
with the same underlying data type as the input Y
.
The size of loss
depends on the Reduction argument.
Algorithms
Cross-Entropy Loss
For each element Yj of the input, thecrossentropy
function computes the corresponding cross-entropy element-wise loss values using the formula
where Tj is the corresponding target value to Yj.
To reduce the loss values to a scalar, the function then reduces the element-wise loss using the formula
where N is the normalization factor,mj is the mask value for element_j_, and wj is the weight value for element j.
If you do not opt to reduce the loss, then the function applies the mask and the weights to the loss values directly:
This table shows the loss formulations for different tasks.
Task | Description | Loss |
---|---|---|
Single-label classification | Cross-entropy loss for mutually exclusive classes. This is useful when observations must have a single label only. | loss=−1N∑n=1N∑i=1KTn,ilnYn,i,where N and K are the numbers of observations, and classes, respectively. |
Multi-label classification | Cross-entropy loss for independent classes. This is useful when observations can have multiple labels. | loss=−1N∑n=1N∑i=1K(Tniln(Yn,i)+(1−Tn,i)ln(1−Yn,i)),where N and K are the numbers of observations and classes, respectively. |
Single-label classification with weighted classes | Cross-entropy loss with class weights. This is useful for datasets with imbalanced classes. | loss=−1N∑n=1N∑i=1KwiTn,ilnYn,i,where N and K are the numbers of observations and classes, respectively, and_wi_ denotes the weight for class_i_. |
Sequence-to-sequence classification | Cross-entropy loss with masked time-steps. This is useful for ignoring loss values that correspond to padded data. | loss=−1N∑n=1N∑t=1Smn,t∑i=1KTn,t,ilnYn,t,i,where N, S, and_K_ are the numbers of observations, time steps, and classes,mnt denotes the mask value for time step t of observation_n_. |
Deep Learning Array Formats
Most deep learning networks and functions operate on different dimensions of the input data in different ways.
For example, an LSTM operation iterates over the time dimension of the input data, and a batch normalization operation normalizes over the batch dimension of the input data.
To provide input data with labeled dimensions or input data with additional layout information, you can use data formats.
A data format is a string of characters, where each character describes the type of the corresponding data dimension.
The characters are:
"S"
— Spatial"C"
— Channel"B"
— Batch"T"
— Time"U"
— Unspecified
For example, consider an array containing a batch of sequences where the first, second, and third dimensions correspond to channels, observations, and time steps, respectively. You can specify that this array has the format "CBT"
(channel, batch, time).
To create formatted input data, create a dlarray object and specify the format using the second argument.
To provide additional layout information with unformatted data, specify the formats using the DataFormat and WeightsFormat arguments.
For more information, see Deep Learning Data Formats.
Extended Capabilities
GPU Arrays
Accelerate code by running on a graphics processing unit (GPU) using Parallel Computing Toolbox™.
The crossentropy
function supports GPU array input with these usage notes and limitations:
- When at least one of these input arguments is a
gpuArray
or adlarray
with underlying data of typegpuArray
, this function runs on the GPU:Y
targets
weights
'Mask'
For more information, see Run MATLAB Functions on a GPU (Parallel Computing Toolbox).
Version History
Introduced in R2019b
R2023b: TargetCategories
is not recommended
TargetCategories
is not recommended. UseClassificationMode
instead. To update your code, replace all instances of TargetCategories="exclusive"
withClassificationMode="single-label"
and all instances ofTargetCategories="independent"
withClassificationMode="multilabel"
. There are no differences between the properties that require additional updates to your code. The default behavior of thecrossentropy
function remains the same.