Define Custom Training Loops, Loss Functions, and Networks - MATLAB & Simulink (original) (raw)

For most deep learning tasks, you can use a pretrained neural network and adapt it to your own data. For an example showing how to use transfer learning to retrain a convolutional neural network to classify a new set of images, see Retrain Neural Network to Classify New Images. Alternatively, you can create and train neural networks from scratch using the trainnet andtrainingOptions functions.

If the trainingOptions function does not provide the training options that you need for your task, then you can create a custom training loop using automatic differentiation. To learn more, see Train Network Using Custom Training Loop.

If the trainnet function does not provide the loss function that you need for your task, then you can specify a custom loss function to the trainnet as a function handle. For loss functions that require more inputs than the predictions and targets (for example, loss functions that require access to the neural network or additional inputs), train the model using a custom training loop. To learn more, see Train Network Using Custom Training Loop.

If Deep Learning Toolbox™ does not provide the layers you need for your task, then you can create a custom layer. To learn more, see Define Custom Deep Learning Layers. For models that cannot be specified as networks of layers, you can define the model as a function. To learn more, see Train Network Using Model Function.

For more information about which training method to use for which task, see Train Deep Learning Model in MATLAB.

Define Custom Loss Function

The trainnet function provides several built-in loss functions to use for training. For example, you can specify cross-entropy loss for classification and mean squared error loss for regression by specifying "crossentropy" and "mse" as the loss function argument, respectively.

If the trainnet function does not provide the loss function that you need for your task, then you can specify a custom loss function to the trainnet as a function handle. The function must have the syntax loss = f(Y,T), whereY and T are the predictions and targets, respectively.

To help create a custom loss function, you can use the deep learning functions in this table. You can also pass these functions to the trainnet function directly as a function handle.

Function	Description
softmax	The softmax activation operation applies the softmax function to the channel dimension of the input data.
sigmoid	The sigmoid activation operation applies the sigmoid function to the input data.
crossentropy	The cross-entropy operation computes the cross-entropy loss between network predictions and binary or one-hot encoded targets for single-label and multi-label classification tasks.
indexcrossentropy	The index cross-entropy operation computes the cross-entropy loss between network predictions and targets specified as integer class indices for single-label classification tasks.
l1loss	The L1 loss operation computes the L1 loss given network predictions and target values. When theReduction option is "sum" and theNormalizationFactor option is "batch-size", the computed value is known as the mean absolute error (MAE).
l2loss	The L2 loss operation computes the L2 loss (based on the squared L2 norm) given network predictions and target values. When the Reduction option is"sum" and the NormalizationFactor option is"batch-size", the computed value is known as the mean squared error (MSE).
huber	The Huber operation computes the Huber loss between network predictions and target values for regression tasks. When the 'TransitionPoint' option is 1, this is also known as smooth L1 loss.
ctc	The CTC operation computes the connectionist temporal classification (CTC) loss between unaligned sequences.
mse	The half mean squared error operation computes the half mean squared error loss between network predictions and target values for regression tasks.

For loss functions that require more inputs than the predictions and targets (for example, loss functions that require access to the neural network or additional inputs), train the model using a custom training loop. For more information, see Define Custom Training Loop Loss Function. For an example, seeTrain Network Using Custom Training Loop.

Define Deep Learning Model for Custom Training Loop

For most tasks, you can control the training algorithm details using the trainingOptions and trainnet functions. If the trainingOptions function does not provide the options you need for your task (for example, a custom solver), then you can define your own custom training loop.

Define Model as Neural Network

For models that you can specify as an array or a neural network of layers, specify the model as a dlnetwork object. For example, to define a simple LSTM neural network for a custom training loop, use:

layers = [ sequenceInputLayer(3) lstmLayer(100,OutputMode="last") fullyConnectedLayer(4) softmaxLayer]; net = dlnetwork(layers);

To train the neural network using a custom training loop, the network must be initialized. To initialize a neural network, use the initialize function.

For an example showing how to train a neural network with a custom training loop, see Train Network Using Custom Training Loop.

Define Model as Function

For architectures that cannot be created using an array or network of layers, you can define the model as a function of the form [Y1,...,YM] = model(parameters,X1,...,XN), where parameters contains the network parameters, X1,...,XN corresponds to the input data for the N model inputs, andY1,...,YM corresponds to the M model outputs. To train a deep learning model defined as a function, use a custom training loop. For an example, see Train Network Using Model Function.

When you define a deep learning model as a function, you must manually initialize the learnable parameters. For more information, see Initialize Learnable Parameters for Model Function.

If you define a custom network as a function, then the model function must support automatic differentiation. You can use the deep learning operations in this table. The functions listed here are only a subset. For a complete list of functions that support dlarray input, see List of Functions with dlarray Support.

Function	Description
attention	The attention operation focuses on parts of the input using weighted multiplication operations.
avgpool	The average pooling operation performs downsampling by dividing the input into pooling regions and computing the average value of each region.
batchnorm	The batch normalization operation normalizes the input data across all observations for each channel independently. To speed up training of the convolutional neural network and reduce the sensitivity to network initialization, use batch normalization between convolution and nonlinear operations such as relu.
crossentropy	The cross-entropy operation computes the cross-entropy loss between network predictions and binary or one-hot encoded targets for single-label and multi-label classification tasks.
indexcrossentropy (since R2024b)	The index cross-entropy operation computes the cross-entropy loss between network predictions and targets specified as integer class indices for single-label classification tasks.
crosschannelnorm	The cross-channel normalization operation uses local responses in different channels to normalize each activation. Cross-channel normalization typically follows a relu operation. Cross-channel normalization is also known as local response normalization.
ctc	The CTC operation computes the connectionist temporal classification (CTC) loss between unaligned sequences.
dlconv	The convolution operation applies sliding filters to the input data. Use the dlconv function for deep learning convolution, grouped convolution, and channel-wise separable convolution.
dlode45	The neural ordinary differential equation (ODE) operation returns the solution of a specified ODE.
dltranspconv	The transposed convolution operation upsamples feature maps.
embed	The embed operation converts numeric indices to numeric vectors, where the indices correspond to discrete data. Use embeddings to map discrete data such as categorical values or words to numeric vectors.
fullyconnect	The fully connect operation multiplies the input by a weight matrix and then adds a bias vector.
gelu	The Gaussian error linear unit (GELU) activation operation weights the input by its probability under a Gaussian distribution.
groupnorm	The group normalization operation normalizes the input data across grouped subsets of channels for each observation independently. To speed up training of the convolutional neural network and reduce the sensitivity to network initialization, use group normalization between convolution and nonlinear operations such as relu.
gru	The gated recurrent unit (GRU) operation allows a network to learn dependencies between time steps in time series and sequence data.
huber	The Huber operation computes the Huber loss between network predictions and target values for regression tasks. When the 'TransitionPoint' option is 1, this is also known as smooth L1 loss.
instancenorm	The instance normalization operation normalizes the input data across each channel for each observation independently. To improve the convergence of training the convolutional neural network and reduce the sensitivity to network hyperparameters, use instance normalization between convolution and nonlinear operations such as relu.
l1loss	The L1 loss operation computes the L1 loss given network predictions and target values. When theReduction option is "sum" and theNormalizationFactor option is "batch-size", the computed value is known as the mean absolute error (MAE).
l2loss	The L2 loss operation computes the L2 loss (based on the squared L2 norm) given network predictions and target values. When the Reduction option is"sum" and the NormalizationFactor option is"batch-size", the computed value is known as the mean squared error (MSE).
layernorm	The layer normalization operation normalizes the input data across all channels for each observation independently. To speed up training of recurrent and multilayer perceptron neural networks and reduce the sensitivity to network initialization, use layer normalization after the learnable operations, such as LSTM and fully connect operations.
leakyrelu	The leaky rectified linear unit (ReLU) activation operation performs a nonlinear threshold operation, where any input value less than zero is multiplied by a fixed scale factor.
lstm	The long short-term memory (LSTM) operation allows a network to learn long-term dependencies between time steps in time series and sequence data.
maxpool	The maximum pooling operation performs downsampling by dividing the input into pooling regions and computing the maximum value of each region.
maxunpool	The maximum unpooling operation unpools the output of a maximum pooling operation by upsampling and padding with zeros.
mse	The half mean squared error operation computes the half mean squared error loss between network predictions and target values for regression tasks.
onehotdecode	The one-hot decode operation decodes probability vectors, such as the output of a classification network, into classification labels.The input A can be a dlarray. IfA is formatted, the function ignores the data format.
relu	The rectified linear unit (ReLU) activation operation performs a nonlinear threshold operation, where any input value less than zero is set to zero.
sigmoid	The sigmoid activation operation applies the sigmoid function to the input data.
softmax	The softmax activation operation applies the softmax function to the channel dimension of the input data.

Define Custom Training Loop Loss Function

Training a deep neural model is an optimization task. By considering a deep learning model as a function f(X;θ), where X is the model input, and θ is the set of learnable parameters, you can optimize_θ_ so that it minimizes some loss value based on the training data. For example, optimize the learnable parameters θ such that for a given inputs X with a corresponding targets T, they minimize the error between the predictions Y=f(X;θ) and_T_.

To train a deep learning model with a custom training loop, you can minimize the loss using gradient-descent based methods. For example, you can iteratively update the learnable parameters of the model such that it minimizes the loss. For example, you can update the learnable parameters using the lbfgsupdate,adamupdate, rmspropupdate, andsgdmupdate functions, which require the gradients of the learnable parameters with respect to the loss. To calculate these gradients, you can use automatic differentiation. Create a custom loss function that takes the model and training data, and returns the loss and the gradients of the loss with respect to the learnable parameters.

For a model specified as a dlnetwork object, create a function of the form[loss,gradients] = modelLoss(net,X,T), where net is the network, X is the network input, T contains the targets, and loss and gradients are the returned loss and gradients, respectively. Optionally, you can pass extra arguments to the gradients function (for example, if the loss function requires extra information), or return extra arguments (for example, the updated network state).

For a model specified as a function, create a function of the form [loss,gradients] = modelLoss(parameters,X,T), where parameters contains the learnable parameters, X is the model input, T contains the targets, and loss and gradients are the returned loss and gradients, respectively. Optionally, you can pass extra arguments to the gradients function (for example, if the loss function requires extra information), or return extra arguments (for example, the updated model state).

To calculate the gradients in the modelLoss function, use thedlgradient function.

To learn more about defining model loss functions for custom training loops, see Define Model Loss Function for Custom Training Loop.

For an example showing how to train a generative adversarial network (GAN) that generates images using a custom loss function, see Train Generative Adversarial Network (GAN).

Update Learnable Parameters Using Automatic Differentiation

To evaluate the model loss function using automatic differentiation, use the dlfeval function, which evaluates a function with automatic differentiation enabled. For the first input of dlfeval, pass the model loss function specified as a function handle. For the following inputs, pass the required variables for the model loss function. For the outputs of the dlfeval function, specify the same outputs as the model loss function.

To update the learnable parameters, you can use these functions.

Function	Description
adamupdate	Update parameters using adaptive moment estimation (Adam)
rmspropupdate	Update parameters using root mean squared propagation (RMSProp)
sgdmupdate	Update parameters using stochastic gradient descent with momentum (SGDM)
lbfgsupdate	Update parameters using limited-memory BFGS (L-BFGS)
dlupdate	Update parameters using custom function

For example, to update the learnable parameters using SGDM, in each iteration of the custom training loop use:

[loss,gradients] = dlfeval(@modelLoss,net,X,T); [net,velocity] = sgdmupdate(net,gradients,velocity,learnRate,momentum);