imagePretrainedNetwork - Pretrained neural network for images - MATLAB (original) (raw)
Pretrained neural network for images
Since R2024a
Syntax
Description
The imagePretrainedNetwork
function loads a pretrained neural network and optionally adapts the neural network architecture for transfer learning and fine-tuning.
[[net](#mw%5Fc9c0c235-bb36-4fad-b84b-fdf9a3254e81),[classNames](#mw%5Fa74836ba-4112-4288-9a81-94d51d593def)] = imagePretrainedNetwork
returns a pretrained SqueezeNet neural network and the network class names. This network is trained on the ImageNet data set for 1000 classes.
[[net](#mw%5Fc9c0c235-bb36-4fad-b84b-fdf9a3254e81),[classNames](#mw%5Fa74836ba-4112-4288-9a81-94d51d593def)] = imagePretrainedNetwork([name](#mw%5Fa8067bf1-f49d-4e4a-9190-f849c01dac73))
returns the specified pretrained neural network and its class names.
[[net](#mw%5Fc9c0c235-bb36-4fad-b84b-fdf9a3254e81),[classNames](#mw%5Fa74836ba-4112-4288-9a81-94d51d593def)] = imagePretrainedNetwork(___,[Name=Value](#namevaluepairarguments))
specifies options using one or more name-value arguments, in addition to any combination of input arguments from previous syntaxes. For example, Weights="none"
specifies to return the neural network uninitialized, without the pretrained weights.
Examples
Load a pretrained SqueezeNet neural network and the network class names into the workspace.
[net,classNames] = imagePretrainedNetwork;
View the network properties.
net = dlnetwork with properties:
Layers: [68×1 nnet.cnn.layer.Layer]
Connections: [75×2 table]
Learnables: [52×3 table]
State: [0×3 table]
InputNames: {'data'}
OutputNames: {'prob_flatten'}
Initialized: 1
View summary with summary.
View the first few class names.
"tench"
"goldfish"
"great white shark"
"tiger shark"
"hammerhead"
"electric ray"
"stingray"
"cock"
Load a pretrained SqueezeNet neural network into the workspace.
[net,classNames] = imagePretrainedNetwork;
Read an image from a PNG file and classify it. To classify the image, first convert it to the data type single
.
im = imread("peppers.png"); figure imshow(im)
X = single(im); scores = predict(net,X); [label,score] = scores2label(scores,classNames);
Display the image with the predicted label and corresponding score.
figure imshow(im) title(string(label) + " (Score: " + score + ")")
You can retrain a pretrained network for new datasets by adapting the neural network to match the new task and using its learned weights as a starting point. To adapt the network to the new data, replace the last few layers (known as the network head) so that it outputs prediction scores for each of the classes for the new task.
Load Training Data
Extract the MathWorks™ Merch data set. This is a small data set that contains 75 images of MathWorks merchandise, which belong to five different classes. The data is arranged such that the images are in subfolders that correspond to these five classes.
folderName = "MerchData"; unzip("MerchData.zip",folderName);
Create an image data store. An image datastore enables you to store large collections of image data, including data that does not fit in memory, and efficiently read batches of images when training a neural network. Specify the folder with the extracted images, and indicate that the subfolder names correspond to the image labels.
imds = imageDatastore(folderName, ... IncludeSubfolders=true, ... LabelSource="foldernames");
Display some sample images.
numImages = numel(imds.Labels); idx = randperm(numImages,16); I = imtile(imds,Frames=idx); figure imshow(I)
View the class names and the number of classes.
classNames = categories(imds.Labels)
classNames = 5×1 cell {'MathWorks Cap' } {'MathWorks Cube' } {'MathWorks Playing Cards'} {'MathWorks Screwdriver' } {'MathWorks Torch' }
numClasses = numel(classNames)
Partition the data into training and validation data sets. Use 70% of the images for training, 15% for validation, and 15% for testing. The splitEachLabel
function splits the image datastore into two new datastores.
[imdsTrain,imdsValidation,imdsTest] = splitEachLabel(imds,0.7,0.15,"randomized");
Load Pretrained Network
Load a pretrained SqueezeNet neural network into the workspace. To return a neural network ready to be retrained for the new data, specify the number of classes.
net = imagePretrainedNetwork(NumClasses=numClasses)
net = dlnetwork with properties:
Layers: [68×1 nnet.cnn.layer.Layer]
Connections: [75×2 table]
Learnables: [52×3 table]
State: [0×3 table]
InputNames: {'data'}
OutputNames: {'prob_flatten'}
Initialized: 1
View summary with summary.
Get the neural network input size from the input layer.
inputSize = net.Layers(1).InputSize
inputSize = 1×3
227 227 3
The learnable layer in the network head (the last layer with learnable parameters) requires retraining. The layer is usually a fully connected layer, or a convolutional layer, with an output size that matches the number of classes.
To increase the level of updates to this layer and speed up convergence, increase the learning rate factor of its learnable parameters by using the setLearnRateFactor
function. Set the learning rate factors of the learnable parameters to 10
.
net = setLearnRateFactor(net,"conv10/Weights",10); net = setLearnRateFactor(net,"conv10/Bias",10);
Prepare Data for Training
The images in the datastore can have different sizes. To automatically resize the training images, use an augmented image datastore. Data augmentation also helps prevent the network from overfitting and memorizing the exact details of the training images. Specify these additional augmentation operations to perform on the training images: randomly flip the training images along the vertical axis, and randomly translate them up to 30 pixels horizontally and vertically.
pixelRange = [-30 30];
imageAugmenter = imageDataAugmenter( ... RandXReflection=true, ... RandXTranslation=pixelRange, ... RandYTranslation=pixelRange);
augimdsTrain = augmentedImageDatastore(inputSize(1:2),imdsTrain, ... DataAugmentation=imageAugmenter);
To automatically resize the validation and testing images without performing further data augmentation, use an augmented image datastore without specifying any additional preprocessing operations.
augimdsValidation = augmentedImageDatastore(inputSize(1:2),imdsValidation); augimdsTest = augmentedImageDatastore(inputSize(1:2),imdsTest);
Specify Training Options
Specify the training options. Choosing among the options requires empirical analysis. To explore different training option configurations by running experiments, you can use the Experiment Manager app.
For this example, use these options:
- Train using the Adam optimizer.
- To reduce the level of updates to the pretrained weights, use a smaller learning rate. Set the learning rate to
0.0001
. - Validate the network using the validation data every five iterations. For larger datasets, to prevent validation from slowing down training, increase this value.
- Display the training progress in a plot, and monitor the accuracy metric.
- Disable the verbose output.
options = trainingOptions("adam", ... InitialLearnRate=0.0001, ... ValidationData=augimdsValidation, ... ValidationFrequency=5, ... Plots="training-progress", ... Metrics="accuracy", ... Verbose=false);
Train Neural Network
Train the neural network using the trainnet function. For classification, use cross-entropy loss. By default, the trainnet
function uses a GPU if one is available. Using a GPU requires a Parallel Computing Toolbox™ license and a supported GPU device. For information on supported devices, see GPU Computing Requirements (Parallel Computing Toolbox). Otherwise, the trainnet
function uses the CPU. To specify the execution environment, use the ExecutionEnvironment
training option.
net = trainnet(augimdsTrain,net,"crossentropy",options);
Test Neural Network
Test the neural network using the testnet function. For single-label classification, evaluate the accuracy. The accuracy is the percentage of correct predictions. By default, the testnet
function uses a GPU if one is available. To select the execution environment manually, use the ExecutionEnvironment
argument of the testnet
function.
accuracy = testnet(net,augimdsTest,"accuracy")
Make Predictions
Read and display a test image.
im = imread("MerchDataTest.jpg"); figure imshow(im)
Use the neural network to make a prediction. To make a prediction with a single image, convert the image to the data type single
and use the predict function. To use a GPU if one is available, first convert the data to gpuArray
. To make predictions with multiple images, use the minibatchpredict function.
X = single(im);
if canUseGPU X = gpuArray(X); end
scores = predict(net,X); label = scores2label(scores,classNames);
Display the image and the prediction.
figure imshow(im) title("Prediction: " + string(label))
Input Arguments
Name of the pretrained neural network, specified as one of these values:
imagePretrainedNetwork Model Name Argument | Neural Network Name | Depth | Parameter Memory | Parameters (Millions) | Image Input Size | Input Value Range | Input Layer Normalization | Required Support Package |
---|---|---|---|---|---|---|---|---|
"squeezenet" | SqueezeNet [2] | 18 | 4.7 MB | 1.24 | 227-by-227 | [0, 255] | "zerocenter" | None |
"googlenet" | GoogLeNet [3][4] | 22 | 27 MB | 7.0 | 224-by-224 | [0, 255] | "zerocenter" | Deep Learning Toolbox™ Model for GoogLeNet Network |
"googlenet-places365" | [0, 255] | "zerocenter" | ||||||
"inceptionv3" | Inception-v3 [5] | 48 | 91 MB | 23.9 | 299-by-299 | [0, 255] | "rescale-symmetric" | Deep Learning Toolbox Model for Inception-v3 Network |
"densenet201" | DenseNet-201 [6] | 201 | 77 MB | 20.0 | 224-by-224 | [0, 255] | "zscore" | Deep Learning Toolbox Model for DenseNet-201 Network |
"mobilenetv2" | MobileNet-v2 [7] | 53 | 14 MB | 3.5 | 224-by-224 | [0, 255] | "zscore" | Deep Learning Toolbox Model for MobileNet-v2 Network |
"resnet18" | ResNet-18 [8] | 18 | 45 MB | 11.7 | 224-by-224 | [0, 255] | "zscore" | Deep Learning Toolbox Model for ResNet-18 Network |
"resnet50" | ResNet-50 [8] | 50 | 98 MB | 25.6 | 224-by-224 | [0, 255] | "zscore" | Deep Learning Toolbox Model for ResNet-50 Network |
"resnet101" | ResNet-101 [8] | 101 | 171 MB | 44.6 | 224-by-224 | [0, 255] | "zerocenter" | Deep Learning Toolbox Model for ResNet-101 Network |
"xception" | Xception [9] | 71 | 88 MB | 22.9 | 299-by-299 | [0, 255] | "rescale-symmetric" | Deep Learning Toolbox Model for Xception Network |
"inceptionresnetv2" | Inception-ResNet-v2 [10] | 164 | 213 MB | 55.9 | 299-by-299 | [0, 255] | "rescale-symmetric" | Deep Learning Toolbox Model for Inception-ResNet-v2 Network |
"shufflenet" | ShuffleNet [11] | 50 | 5.5 MB | 1.4 | 224-by-224 | [0, 255] | "zscore" | Deep Learning Toolbox Model for ShuffleNet Network |
"nasnetmobile" | NASNet-Mobile [12] | * | 20 MB | 5.3 | 224-by-224 | [0, 255] | "rescale-symmetric" | Deep Learning Toolbox Model for NASNet-Mobile Network |
"nasnetlarge" | NASNet-Large [12] | * | 340 MB | 88.9 | 331-by-331 | [0, 255] | "rescale-symmetric" | Deep Learning Toolbox Model for NASNet-Large Network |
"darknet19" | DarkNet-19 [13] | 19 | 80 MB | 20.8 | 256-by-256 | [0, 255] | "rescale-zero-one" | Deep Learning Toolbox Model for DarkNet-19 Network |
"darknet53" | DarkNet-53 [13] | 53 | 159 MB | 41.6 | 256-by-256 | [0, 255] | "rescale-zero-one" | Deep Learning Toolbox Model for DarkNet-53 Network |
"efficientnetb0" | EfficientNet-b0 [14] | 82 | 20 MB | 5.3 | 224-by-224 | [0, 255] | "zscore" | Deep Learning Toolbox Model for EfficientNet-b0 Network |
"alexnet" | AlexNet [15] | 8 | 233 MB | 61.0 | 227-by-227 | [0, 255] | "zerocenter" | Deep Learning Toolbox Model for AlexNet Network |
"vgg16" | VGG-16 [16] | 16 | 528 MB | 138 | 224-by-224 | [0, 255] | "zerocenter" | Deep Learning Toolbox Model for VGG-16 Network |
"vgg19" | VGG-19 [16] | 19 | 548 MB | 144 | 224-by-224 | [0, 255] | "zerocenter" | Deep Learning Toolbox Model for VGG-19 Network |
Note
If you set the Weights option to "none"
, then, for most models, downloading the support package is not required.
Name-Value Arguments
Specify optional pairs of arguments asName1=Value1,...,NameN=ValueN
, where Name
is the argument name and Value
is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter.
Example: net = imagePretrainedNetwork("googlenet",NumClasses=10)
returns a pretrained GoogLeNet neural network ready to be retrained for a 10-class classification task.
Number of classes for classification tasks, specified as a positive integer or[]
.
If NumClasses
is an integer, then theimagePretrainedNetwork
function adapts the pretrained neural network for classification tasks with the specified number of classes by replacing the learnable layer in the classification head of the network.
If you specify the NumClasses
option, thenNumResponses must be []
, and the function must not output the classNames argument.
Data Types: single
| double
| int8
| int16
| int32
| int64
| uint8
| uint16
| uint32
| uint64
Number of responses for regression tasks, specified as a positive integer or[]
.
If NumResponses
is an integer, then theimagePretrainedNetwork
function adapts the pretrained neural network for regression tasks with the specified number of responses by replacing the classification head of the network with a head for regression tasks.
If you specify the NumResponses
option, thenNumClasses must be []
and the function must not output the classNames argument.
Data Types: single
| double
| int8
| int16
| int32
| int64
| uint8
| uint16
| uint32
| uint64
Neural network weights, specified as one of these values:
"pretrained"
— Return the neural network with its pretrained weights."none"
— Return only the uninitialized neural network architecture. In this case, most networks do not require you to download a support package.
Class name type, specified as one of these values:
"string"
— Return class names as a string array."cell"
— Return class names as a cell array of character vectors. Use this option for code generation.
Output Arguments
Neural network, returned as a dlnetwork
object.
Class names, returned as a string array or a cell array of character vectors.
The function returns class names only when both the NumClasses and NumResponses values are []
. The data type ofclassNames
depends on the ClassNamesTypes
argument.
Data Types: string
| cell
Tips
- To create and customize 2-D and 3-D ResNet neural network architectures, use theresnetNetwork and resnet3dNetwork functions, respectively.
References
[1] ImageNet. http://www.image-net.org.
[2] Iandola, Forrest N., Song Han, Matthew W. Moskewicz, Khalid Ashraf, William J. Dally, and Kurt Keutzer. “SqueezeNet: AlexNet-Level Accuracy with 50x Fewer Parameters and <0.5MB Model Size.” Preprint, submitted November 4, 2016. https://arxiv.org/abs/1602.07360.
[3] Szegedy, Christian, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich. “Going Deeper with Convolutions.” In 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 1–9. Boston, MA, USA: IEEE, 2015. https://doi.org/10.1109/CVPR.2015.7298594.
[4] Places. http://places2.csail.mit.edu/
[5] Szegedy, Christian, Vincent Vanhoucke, Sergey Ioffe, Jon Shlens, and Zbigniew Wojna. “Rethinking the Inception Architecture for Computer Vision.” In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2818–26. Las Vegas, NV, USA: IEEE, 2016. https://doi.org/10.1109/CVPR.2016.308.
[6] Huang, Gao, Zhuang Liu, Laurens Van Der Maaten, and Kilian Q. Weinberger. “Densely Connected Convolutional Networks.” In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2261–69. Honolulu, HI: IEEE, 2017. https://doi.org/10.1109/CVPR.2017.243.
[7] Sandler, Mark, Andrew Howard, Menglong Zhu, Andrey Zhmoginov, and Liang-Chieh Chen. “MobileNetV2: Inverted Residuals and Linear Bottlenecks.” In 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 4510–20. Salt Lake City, UT: IEEE, 2018. https://doi.org/10.1109/CVPR.2018.00474.
[8] He, Kaiming, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. “Deep Residual Learning for Image Recognition.” In_2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)_, 770–78. Las Vegas, NV, USA: IEEE, 2016. https://doi.org/10.1109/CVPR.2016.90.
[9] Chollet, François. “Xception: Deep Learning with Depthwise Separable Convolutions.” Preprint, submitted in 2016. https://doi.org/10.48550/ARXIV.1610.02357.
[10] Szegedy, Christian, Sergey Ioffe, Vincent Vanhoucke, and Alexander Alemi. “Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning.” Proceedings of the AAAI Conference on Artificial Intelligence 31, no. 1 (February 12, 2017). https://doi.org/10.1609/aaai.v31i1.11231.
[11] Zhang, Xiangyu, Xinyu Zhou, Mengxiao Lin, and Jian Sun. “ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices.” Preprint, submitted July 4, 2017. http://arxiv.org/abs/1707.01083.
[12] Zoph, Barret, Vijay Vasudevan, Jonathon Shlens, and Quoc V. Le. “Learning Transferable Architectures for Scalable Image Recognition.” Preprint, submitted in 2017. https://doi.org/10.48550/ARXIV.1707.07012.
[13] Redmon, Joseph. “Darknet: Open Source Neural Networks in C.” https://pjreddie.com/darknet.
[14] Tan, Mingxing, and Quoc V. Le. “EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks.” Preprint, submitted in 2019. https://doi.org/10.48550/ARXIV.1905.11946\.
[15] Krizhevsky, Alex, Ilya Sutskever, and Geoffrey E. Hinton. "ImageNet Classification with Deep Convolutional Neural Networks."Communications of the ACM 60, no. 6 (May 24, 2017): 84–90. https://doi.org/10.1145/3065386.
[16] Simonyan, Karen, and Andrew Zisserman. “Very Deep Convolutional Networks for Large-Scale Image Recognition.” Preprint, submitted in 2014. https://doi.org/10.48550/ARXIV.1409.1556.
Extended Capabilities
Usage notes and limitations:
- Code generation only supports setting the name-value argument
ClassNamesType
to"cell"
. - Code generation does not support setting the name-value argument
Weights
to"none"
.
Usage notes and limitations:
Refer to the usage notes and limitations in the C/C++ Code Generation section. The same limitations apply to GPU code generation.
Version History
Introduced in R2024a