augmentedImageDatastore - Transform batches to augment image data - MATLAB (original) (raw)
Transform batches to augment image data
Description
An augmented image datastore transforms batches of training, validation, test, and prediction data, with optional preprocessing such as resizing, rotation, and reflection. Resize images to make them compatible with the input size of your deep learning network. Augment training image data with randomized preprocessing operations to help prevent the network from overfitting and memorizing the exact details of the training images.
To train a network using augmented images, supply theaugmentedImageDatastore
to the trainnet function. For more information, see Preprocess Images for Deep Learning.
- When you use an augmented image datastore as a source of training images, the datastore randomly perturbs the training data for each epoch, so that each epoch uses a slightly different data set. The actual number of training images at each epoch does not change. The transformed images are not stored in memory.
- An imageInputLayer normalizes images using the mean of the augmented images, not the mean of the original data set. This mean is calculated once for the first augmented epoch. All other epochs use the same mean, so that the average image does not change during training.
- Use an augmented image datastore for efficient preprocessing of images for deep learning, including image resizing. Do not use the
ReadFcn
option ofImageDatastore
objects.ImageDatastore
allows batch reading of JPG or PNG image files using prefetching. If you set theReadFcn
option to a custom function, thenImageDatastore
does not prefetch and is usually significantly slower.
By default, an augmentedImageDatastore
only resizes images to fit the output size. You can configure options for additional image transformations using animageDataAugmenter.
Creation
Syntax
Description
auimds = augmentedImageDatastore([outputSize](#mw%5F28733dce-0e89-480c-b4d8-7ad0ca8fdd27),[imds](#mw%5Fb2ab8e2e-ead7-4682-8392-110a389ee06c))
creates an augmented image datastore for classification problems using images from image datastore imds
. The datastore resizes images to the height and width specified by outputSize
.
auimds = augmentedImageDatastore([outputSize](#mw%5F28733dce-0e89-480c-b4d8-7ad0ca8fdd27),[X](#mw%5Fe3105353-662c-4e21-8caa-5eae17d2e955),[Y](#d126e18094))
creates an augmented image datastore for classification and regression problems. The array X
contains the predictor variables and the arrayY
contains the categorical labels or numeric responses.
auimds = augmentedImageDatastore([outputSize](#mw%5F28733dce-0e89-480c-b4d8-7ad0ca8fdd27),[X](#mw%5Fe3105353-662c-4e21-8caa-5eae17d2e955))
creates an augmented image datastore for predicting responses of image data in array X
.
auimds = augmentedImageDatastore([outputSize](#mw%5F28733dce-0e89-480c-b4d8-7ad0ca8fdd27),[tbl](#mw%5F59974969-7bbc-46a4-8300-cbc042c72695))
creates an augmented image datastore for classification and regression problems. The table, tbl
, contains predictors and responses.
auimds = augmentedImageDatastore([outputSize](#mw%5F28733dce-0e89-480c-b4d8-7ad0ca8fdd27),[tbl](#mw%5F59974969-7bbc-46a4-8300-cbc042c72695),[responseNames](#mw%5F5553032b-8939-416b-bfc2-eeb85852b10c))
creates an augmented image datastore for classification and regression problems. The table, tbl
, contains predictors and responses. TheresponseNames
argument specifies the response variables in tbl
.
auimds = augmentedImageDatastore(___,[Name=Value](#namevaluepairarguments))
also sets writable properties using name-value arguments. For example,augmentedImageDatastore([28,28],imds,OutputSizeMode="centercrop")
creates an augmented image datastore that crops images from the center.
Input Arguments
Size of output images, specified as a vector of two positive integers. The first element specifies the height (number of rows) in the output images, and the second element specifies the width (number of columns).
The output images can have a third dimension that represents the color channels. However, if you specify outputSize
as a three-element vector, then the datastore ignores the third element. Instead, the datastore determines the image size in the third dimension in one of these ways:
- For input grayscale and RGB images, which have 1 or 3 color channels, the number of output color channels depends on the value of ColorPreprocessing. For example, if you specify
outputSize
as[28 28 1]
but setColorPreprocessing
as"gray2rgb"
, then the output images have size 28-by-28-by-3. - When the input images do not have 1 or 3 color channels, such as for multispectral or hyperspectral images, then the output images have the same number of color channels as the input images.
This argument sets the OutputSize property.
Images, specified as a 4-D numeric array. The first three dimensions are the height, width, and channels, and the last dimension indexes the individual images.
Data Types: single
| double
| uint8
| int8
| uint16
| int16
| uint32
| int32
Responses for classification or regression, specified as one of the following:
- For a classification problem,
Y
is a categorical vector containing the image labels. - For a regression problem,
Y
can be an:- _n_-by-r numeric matrix. n is the number of observations and r is the number of responses.
- _h_-by-_w_-by-_c_-by-n numeric array._h_-by-w_-by-c is the size of a single response and_n is the number of observations.
Responses must not contain NaN
s.
Data Types: categorical
| double
Input data, specified as a table. tbl
must contain the predictors in the first column as either absolute or relative image paths or images. The type and location of the responses depend on the problem:
- For a classification problem, the response must be a categorical variable containing labels for the images. If the name of the response variable is not specified in the call to
augmentedImageDatastore
, the responses must be in the second column. If the responses are in a different column oftbl
, then you must specify the response variable name using theresponseNames
argument. - For a regression problem, the responses must be numerical values in the column or columns after the first column. The responses can be either in multiple columns as scalars or in a single column as numeric vectors or cell arrays containing numeric 3-D arrays. When you do not specify the name of the response variable or variables,
augmentedImageDatastore
accepts the remaining columns oftbl
as the response variables. You can specify the response variable names using theresponseNames
argument.
Responses must not contain NaN
values. If there areNaN
s in the predictor data, they are propagated through the training, however, in most cases the training fails to converge.
Data Types: table
Names of the response variables in the input table, specified as one of the following:
- For classification or regression tasks with a single response,
responseNames
must be a character vector or string scalar containing the response variable in the input table. - For regression tasks with multiple responses,
responseNames
must be a string array or cell array of character vectors containing the response variables in the input table.
Data Types: char
| cell
| string
Name-Value Arguments
Specify optional pairs of arguments asName1=Value1,...,NameN=ValueN
, where Name
is the argument name and Value
is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter.
Example: auimds = augmentedImageDatastore([28,28],imds,OutputSizeMode="centercrop")
creates an augmented image datastore that crops images from the center.
Preprocessing color operations performed on input grayscale or RGB images, specified as "none"
,"gray2rgb"
, or "rgb2gray"
. When the image datastore contains a mixture of grayscale and RGB images, use ColorPreprocessing
to ensure that all output images have the number of channels required by imageInputLayer.
Note
The augmentedImageDatastore
object converts RGB images to grayscale by using the rgb2gray function. If an image has three channels that do not correspond to red, green, and blue channels (such as an image in the L*a*b* color space), then usingColorPreprocessing
can give poor results.
The datastore does not perform color preprocessing when:
- An input image already has the required number of color channels. For example, if you specify the value
"gray2rgb"
and an input image already has three channels, then no color preprocessing occurs. - The input images do not have 1 or 3 channels, such as for multispectral or hyperspectral images. In this case, all input images must have the same number of channels.
This argument sets the ColorPreprocessing property.
Data Types: char
| string
Preprocessing applied to input images, specified as an imageDataAugmenter object or"none"
. WhenDataAugmentation
is"none"
, the datastore only resizes images to fit the output size, and does not perform additional preprocessing.
This argument sets the DataAugmentation property.
Dispatch observations in the background during training, prediction, or classification, specified as false
or true
. To use background dispatching, you must have Parallel Computing Toolbox™.
Augmented image datastores only perform background dispatching when used with the trainnet function, and inference functions such aspredict and minibatchpredict. Background dispatching does not occur when you call the read
function of the datastore directly.
This argument sets the DispatchInBackground property.
Method used to resize output images, specified as one of the following.
"resize"
— Scale the image using bilinear interpolation to fit the output size.
NoteaugmentedImageDatastore
uses the bilinear interpolation method of imresize with antialiasing. Bilinear interpolation enables fast image processing while avoiding distortions such as caused by nearest-neighbor interpolation. In contrast, by defaultimresize
uses bicubic interpolation with antialiasing to produce a high-quality resized image at the cost of longer processing time."centercrop"
— Take a crop from the center of the training image. The crop has the same size as the output size."randcrop"
— Take a random crop from the training image. The random crop has the same size as the output size.
This argument sets the OutputSizeMode property.
Data Types: char
| string
Properties
Preprocessing color operations performed on input grayscale or RGB images, specified as "none"
, "gray2rgb"
, or"rgb2gray"
. When the image datastore contains a mixture of grayscale and RGB images, useColorPreprocessing
to ensure that all output images have the number of channels required by imageInputLayer.
Note
The augmentedImageDatastore
object converts RGB images to grayscale by using the rgb2gray function. If an image has three channels that do not correspond to red, green, and blue channels (such as an image in the L*a*b* color space), then using ColorPreprocessing
can give poor results.
The datastore does not perform color preprocessing when:
- An input image already has the required number of color channels. For example, if you specify the value
"gray2rgb"
and an input image already has three channels, then no color preprocessing occurs. - The input images do not have 1 or 3 channels, such as for multispectral or hyperspectral images. In this case, all input images must have the same number of channels.
Data Types: char
| string
Preprocessing applied to input images, specified as an imageDataAugmenter object or "none"
. WhenDataAugmentation
is "none"
, the datastore only resizes images to fit the output size, and does not perform additional preprocessing.
Dispatch observations in the background during training, prediction, or classification, specified as false
ortrue
. To use background dispatching, you must have Parallel Computing Toolbox.
Augmented image datastores only perform background dispatching when used with the trainnet function, and inference functions such as predict andminibatchpredict. Background dispatching does not occur when you call the read
function of the datastore directly.
Number of observations that are returned in each batch. You can change the value of MiniBatchSize
only after you create the datastore.
Training and prediction functions that specify a mini-batch size, such astrainingOptions, minibatchpredict, and testnet, do not set the MiniBatchSize
property. For best performance, use the same mini-batch size for your datastore as for your training and prediction functions.
This property is read-only.
Total number of observations in the augmented image datastore, returned as a positive integer. The number of observations is the length of one training epoch.
Size of output images, specified as a vector of two positive integers. The first element specifies the height (number of rows) in the output images, and the second element specifies the width (number of columns).
The OutputSize
property does not indicate the number of color channels of the output images. When you read from the datastore, the output images can have a third dimension that represents the color channels.
- For input grayscale and RGB images, which have 1 or 3 color channels, the number of output channels depends on the value ofColorPreprocessing. For example, when
ColorPreprocessing
is"gray2rgb"
, then the output size in the third dimension is 3. WhenColorPreprocessing
is"rgb2gray"
, then the output images do not have a third dimension. - When the input images do not have 1 or 3 color channels, such as for multispectral or hyperspectral images, the output size in the third dimension is equal to the number of color channels of the input images.
Method used to resize output images, specified as one of the following.
"resize"
— Scale the image using bilinear interpolation to fit the output size.
NoteaugmentedImageDatastore
uses the bilinear interpolation method of imresize with antialiasing. Bilinear interpolation enables fast image processing while avoiding distortions such as caused by nearest-neighbor interpolation. In contrast, by defaultimresize
uses bicubic interpolation with antialiasing to produce a high-quality resized image at the cost of longer processing time."centercrop"
— Take a crop from the center of the training image. The crop has the same size as the output size."randcrop"
— Take a random crop from the training image. The random crop has the same size as the output size.
Data Types: char
| string
Object Functions
combine | Combine data from multiple datastores |
---|---|
hasdata | Determine if data is available to read |
numpartitions | Number of datastore partitions |
partition | Partition a datastore |
partitionByIndex | Partition augmentedImageDatastore according to indices |
preview | Preview subset of data in datastore |
read | Read data from augmentedImageDatastore |
readall | Read all data in datastore |
readByIndex | Read data specified by index fromaugmentedImageDatastore |
reset | Reset datastore to initial state |
shuffle | Shuffle data in augmentedImageDatastore |
subset | Create subset of datastore or FileSet |
transform | Transform datastore |
isPartitionable | Determine whether datastore is partitionable |
isShuffleable | Determine whether datastore is shuffleable |
Examples
Train a convolutional neural network using augmented image data. Data augmentation helps prevent the network from overfitting and memorizing the exact details of the training images.
Load the sample data, which consists of synthetic images of handwritten digits. XTrain
is a 28-by-28-by-1-by-5000 array, where:
- 28 is the height and width of the images.
- 1 is the number of channels.
- 5000 is the number of synthetic images of handwritten digits.
labelsTrain
is a categorical vector containing the labels for each observation.
Set aside 1000 of the images for network validation.
idx = randperm(size(XTrain,4),1000); XValidation = XTrain(:,:,:,idx); XTrain(:,:,:,idx) = []; TValidation = labelsTrain(idx); labelsTrain(idx) = [];
Create an imageDataAugmenter
object that specifies preprocessing options for image augmentation, such as resizing, rotation, translation, and reflection. Randomly translate the images up to three pixels horizontally and vertically, and rotate the images with an angle up to 20 degrees.
imageAugmenter = imageDataAugmenter( ... 'RandRotation',[-20,20], ... 'RandXTranslation',[-3 3], ... 'RandYTranslation',[-3 3])
imageAugmenter = imageDataAugmenter with properties:
FillValue: 0
RandXReflection: 0
RandYReflection: 0
RandRotation: [-20 20]
RandScale: [1 1]
RandXScale: [1 1]
RandYScale: [1 1]
RandXShear: [0 0]
RandYShear: [0 0]
RandXTranslation: [-3 3]
RandYTranslation: [-3 3]
Create an augmentedImageDatastore
object to use for network training and specify the image output size. During training, the datastore performs image augmentation and resizes the images. The datastore augments the images without saving any images to memory. trainnet
updates the network parameters and then discards the augmented images.
imageSize = [28 28 1]; augimds = augmentedImageDatastore(imageSize,XTrain,labelsTrain,'DataAugmentation',imageAugmenter);
Specify the convolutional neural network architecture.
layers = [ imageInputLayer(imageSize)
convolution2dLayer(3,8,'Padding','same')
batchNormalizationLayer
reluLayer
maxPooling2dLayer(2,'Stride',2)
convolution2dLayer(3,16,'Padding','same')
batchNormalizationLayer
reluLayer
maxPooling2dLayer(2,'Stride',2)
convolution2dLayer(3,32,'Padding','same')
batchNormalizationLayer
reluLayer
fullyConnectedLayer(10)
softmaxLayer];
Specify the training options. Choosing among the options requires empirical analysis. To explore different training option configurations by running experiments, you can use the Experiment Manager app.
opts = trainingOptions('sgdm', ... 'MaxEpochs',15, ... 'Shuffle','every-epoch', ... 'Plots','training-progress', ... 'Metrics','accuracy', ... 'Verbose',false, ... 'ValidationData',{XValidation,TValidation});
Train the neural network using the trainnet function. For classification, use cross-entropy loss. By default, the trainnet
function uses a GPU if one is available. Training on a GPU requires a Parallel Computing Toolbox™ license and a supported GPU device. For information on supported devices, see GPU Computing Requirements (Parallel Computing Toolbox). Otherwise, the trainnet
function uses the CPU. To specify the execution environment, use the ExecutionEnvironment
training option.
net = trainnet(augimds,layers,"crossentropy",opts);
Tips
- You can visualize many transformed images in the same figure by using theimtile function. For example, this code displays one mini-batch of transformed images from an augmented image datastore called
auimds
.
minibatch = read(auimds);
imshow(imtile(minibatch.input)) - By default, resizing is the only image preprocessing operation performed on images. Enable additional preprocessing operations by using the DataAugmentation name-value argument with an imageDataAugmenter object. Each time images are read from the augmented image datastore, a different random combination of preprocessing operations are applied to each image.
Version History
Introduced in R2018a