Tune Experiment Hyperparameters by Using Bayesian Optimization - MATLAB & Simulink (original) (raw)

Main Content

This example shows how to use Bayesian optimization in Experiment Manager to find optimal network hyperparameters and training options for convolutional neural networks. Bayesian optimization provides an alternative strategy to sweeping hyperparameters in an experiment. You specify a range of values for each hyperparameter and select a metric to optimize, and Experiment Manager searches for a combination of hyperparameters that optimizes your selected metric. Bayesian optimization requires Statistics and Machine Learning Toolbox™.

In this example, you train a network to classify images from the CIFAR-10 data set. The experiment uses Bayesian optimization to find the combination of hyperparameters that minimizes a custom metric function. The hyperparameters include options of the training algorithm, as well as parameters of the network architecture itself. The custom metric function determines the classification error on a randomly chosen test set. For more information on defining custom metrics in Experiment Manager, see Evaluate Deep Learning Experiments by Using Metric Functions.

Alternatively, you can find optimal hyperparameter values programmatically by calling the bayesopt function. For more information, see Deep Learning Using Bayesian Optimization.

Open Experiment

First, open the example. Experiment Manager loads a project with a preconfigured experiment that you can inspect and run. To open the experiment, in the Experiment Browser pane, double-click BayesOptExperiment.

Built-in training experiments consist of a description, a table of hyperparameters, a setup function, and a collection of metric functions to evaluate the results of the experiment. Experiments that use Bayesian optimization include additional options to limit the duration of the experiment. For more information, see Train Network Using trainnet and Display Custom Metrics.

The Description field contains a textual description of the experiment. For this example, the description is:

Find optimal hyperparameters and training options for convolutional neural network. Hyperparameters determine the network section depth, initial learning rate, stochastic gradient descent momentum, and L2 regularization strength.

The Hyperparameters section specifies the strategy and hyperparameter options to use for the experiment. For each hyperparameter, you can specify these options:

When you run the experiment, Experiment Manager searches for the best combination of hyperparameters. Each trial in the experiment uses a new combination of hyperparameter values based on the results of the previous trials. This example uses these hyperparameters:

Under Bayesian Optimization Options, you can specify the duration of the experiment by entering the maximum time (in seconds) and the maximum number of trials to run. To best use the power of Bayesian optimization, perform at least 30 objective function evaluations.

The Setup Function section specifies a function that configures the training data, network architecture, and training options for the experiment. To open this function in MATLAB® Editor, click Edit. The code for the function also appears in Setup Function. The input to the setup function is a structure with fields from the hyperparameter table. The function returns three outputs that you use to train a network for image classification problems. In this example, the setup function has these sections:

datadir = tempdir; downloadCIFARData(datadir);

[XTrain,YTrain,XTest,YTest] = loadCIFARData(datadir);

idx = randperm(numel(YTest),5000); XValidation = XTest(:,:,:,idx); YValidation = YTest(idx);

imageSize = [32 32 3];

pixelRange = [-4 4]; imageAugmenter = imageDataAugmenter( ... RandXReflection=true, ... RandXTranslation=pixelRange, ... RandYTranslation=pixelRange); augimdsTrain = augmentedImageDatastore(imageSize,XTrain,YTrain, ... DataAugmentation=imageAugmenter);

numClasses = numel(unique(YTrain)); numF = round(16/sqrt(params.SectionDepth));

layers = [ imageInputLayer(imageSize)

convBlock(3,numF,params.SectionDepth)
maxPooling2dLayer(3,Stride=2,Padding="same")

convBlock(3,2*numF,params.SectionDepth)
maxPooling2dLayer(3,Stride=2,Padding="same")

convBlock(3,4*numF,params.SectionDepth)
averagePooling2dLayer(8)

fullyConnectedLayer(numClasses)
softmaxLayer
classificationLayer];

miniBatchSize = 256; validationFrequency = floor(numel(YTrain)/miniBatchSize);

options = trainingOptions("sgdm", ... InitialLearnRate=params.InitialLearnRate, ... Momentum=params.Momentum, ... MaxEpochs=60, ... LearnRateSchedule="piecewise", ... LearnRateDropPeriod=40, ... LearnRateDropFactor=0.1, ... MiniBatchSize=miniBatchSize, ... L2Regularization=params.L2Regularization, ... Shuffle="every-epoch", ... Verbose=false, ... ValidationData={XValidation,YValidation}, ... ValidationFrequency=validationFrequency);

The Metrics section specifies optional functions that evaluate the results of the experiment. Experiment Manager evaluates these functions each time it finishes training the network. This example includes the custom metric function ErrorRate. This function selects 5000 test images and labels at random, evaluates the trained network on these images, and calculates the proportion of images that the network misclassifies. To open this function in MATLAB Editor, select the name of the metric function and click Edit. The code for the function also appears in Compute Error Rate.

The Optimize and Direction fields indicate the metric that the Bayesian optimization algorithm uses as an objective function. For this experiment, Experiment Manager seeks to minimize the value of the ErrorRate metric.

Run Experiment

When you run the experiment, Experiment Manager searches for the best combination of hyperparameters with respect to the chosen metric. Each trial in the experiment uses a new combination of hyperparameter values based on the results of the previous trials.

Training can take some time. To limit the duration of the experiment, you can modify the Bayesian Optimization Options by reducing the maximum running time or the maximum number of trials. However, note that running fewer than 30 trials can prevent the Bayesian optimization algorithm from converging to an optimal set of hyperparameters.

By default, Experiment Manager runs one trial at a time. If you have Parallel Computing Toolbox™, you can run multiple trials at the same time or offload your experiment as a batch job in a cluster:

A table of results displays the metric function values for each trial. Experiment Manager highlights the trial with the optimal value for the selected metric. For example, in this experiment, the fifth trial produces the smallest error rate.

To determine the trial that optimizes the selected metric, Experiment Manager uses the best point criterion "min-observed". For more information, see Bayesian Optimization Algorithm (Statistics and Machine Learning Toolbox) and bestPoint (Statistics and Machine Learning Toolbox).

Evaluate Results

To display the confusion matrix for the best trial in your experiment, select the row in the results table with the lowest error rate. Then, under Review Results, click Validation Data.

To perform additional computations, export the trained network to the workspace:

  1. On the Experiment Manager toolstrip, click Export > Trained Network.
  2. In the dialog window, enter the name of a workspace variable for the exported network. The default name is trainedNetwork.
  3. In the MATLAB Command Window, use the exported network as the input to the helper function testSummary:

testSummary(trainedNetwork)

To view the code for this function, see Summarize Test Statistics. This function evaluates the network in several ways:

The function displays a summary of these statistics in the MATLAB Command Window.


Test error rate: 0.1829 Standard error: 0.0039 95% confidence interval: [0.1753, 0.1905]


To record observations about the results of your experiment, add an annotation:

  1. In the results table, right-click the ErrorRate cell of the best trial.
  2. Select Add Annotation.
  3. In the Annotations pane, enter your observations in the text box.

Close Experiment

In the Experiment Browser pane, right-click CIFARBayesianOptimizationProject and select Close Project. Experiment Manager closes the experiment and results contained in the project.

Setup Function

This function configures the training data, network architecture, and training options for the experiment. The input to this function is a structure with fields from the hyperparameter table. The function returns three outputs that you use to train a network for image classification problems.

function [augimdsTrain,layers,options] = BayesOptExperiment_setup(params)

Load Training Data

datadir = tempdir; downloadCIFARData(datadir);

[XTrain,YTrain,XTest,YTest] = loadCIFARData(datadir);

idx = randperm(numel(YTest),5000); XValidation = XTest(:,:,:,idx); YValidation = YTest(idx);

imageSize = [32 32 3];

pixelRange = [-4 4]; imageAugmenter = imageDataAugmenter( ... RandXReflection=true, ... RandXTranslation=pixelRange, ... RandYTranslation=pixelRange); augimdsTrain = augmentedImageDatastore(imageSize,XTrain,YTrain, ... DataAugmentation=imageAugmenter);

Define Network Architecture

numClasses = numel(unique(YTrain)); numF = round(16/sqrt(params.SectionDepth));

layers = [ imageInputLayer(imageSize)

convBlock(3,numF,params.SectionDepth)
maxPooling2dLayer(3,Stride=2,Padding="same")

convBlock(3,2*numF,params.SectionDepth)
maxPooling2dLayer(3,Stride=2,Padding="same")

convBlock(3,4*numF,params.SectionDepth)
averagePooling2dLayer(8)

fullyConnectedLayer(numClasses)
softmaxLayer
classificationLayer];

Specify Training Options

miniBatchSize = 256; validationFrequency = floor(numel(YTrain)/miniBatchSize);

options = trainingOptions("sgdm", ... InitialLearnRate=params.InitialLearnRate, ... Momentum=params.Momentum, ... MaxEpochs=60, ... LearnRateSchedule="piecewise", ... LearnRateDropPeriod=40, ... LearnRateDropFactor=0.1, ... MiniBatchSize=miniBatchSize, ... L2Regularization=params.L2Regularization, ... Shuffle="every-epoch", ... Verbose=false, ... ValidationData={XValidation,YValidation}, ... ValidationFrequency=validationFrequency);

Create Block of Convolutional Layers

This function creates a block of numConvLayers convolutional layers, each with a specified filterSize and numFilters filters, and each followed by a batch normalization layer and a ReLU layer.

function layers = convBlock(filterSize,numFilters,numConvLayers)

layers = [ convolution2dLayer(filterSize,numFilters,Padding="same") batchNormalizationLayer reluLayer]; layers = repmat(layers,numConvLayers,1);

end

Compute Error Rate

This metric function takes as input a structure that contains the fields trainedNetwork, trainingInfo, and parameters.

The function selects 5000 test images and labels, evaluates the trained network on the test set, calculates the predicted image labels, and calculates the error rate on the test data.

function metricOutput = ErrorRate(trialInfo)

datadir = tempdir; [,,XTest,YTest] = loadCIFARData(datadir);

idx = randperm(numel(YTest),5000); XTest = XTest(:,:,:,idx); YTest = YTest(idx); YPredicted = classify(trialInfo.trainedNetwork,XTest);

metricOutput = 1 - mean(YPredicted == YTest); end

Summarize Test Statistics

This function computes the test error, standard error, and an approximate 95% confidence interval and displays a summary of these statistics in the MATLAB Command Window. The function also some test images together with their predicted classes and the probabilities of those classes.

function testSummary(net)

datadir = tempdir; [,,XTest,YTest] = loadCIFARData(datadir);

[YPredicted,probs] = classify(net,XTest); testError = 1 - mean(YPredicted == YTest); NTest = numel(YTest); testErrorSE = sqrt(testError*(1-testError)/NTest); testError95CI = [testError - 1.96testErrorSE, testError + 1.96testErrorSE];

fprintf("\n\n\n"); fprintf("Test error rate: %.4f\n",testError); fprintf("Standard error: %.4f\n",testErrorSE); fprintf("95%% confidence interval: [%.4f, %.4f]\n",testError95CI(1),testError95CI(2)); fprintf("\n\n\n");

figure idx = randperm(numel(YTest),9); for i = 1:numel(idx) subplot(3,3,i) imshow(XTest(:,:,:,idx(i))); prob = num2str(100*max(probs(idx(i),:)),3); predClass = string(YPredicted(idx(i))); label = predClass+": "+prob+"%"; title(label) end end

See Also

Apps

Functions