augmentedImageDatastore - Transform batches to augment image data - MATLAB (original) (raw)

Transform batches to augment image data

Description

An augmented image datastore transforms batches of training, validation, test, and prediction data, with optional preprocessing such as resizing, rotation, and reflection. Resize images to make them compatible with the input size of your deep learning network. Augment training image data with randomized preprocessing operations to help prevent the network from overfitting and memorizing the exact details of the training images.

To train a network using augmented images, supply theaugmentedImageDatastore to the trainnet function. For more information, see Preprocess Images for Deep Learning.

When you use an augmented image datastore as a source of training images, the datastore randomly perturbs the training data for each epoch, so that each epoch uses a slightly different data set. The actual number of training images at each epoch does not change. The transformed images are not stored in memory.
An imageInputLayer normalizes images using the mean of the augmented images, not the mean of the original data set. This mean is calculated once for the first augmented epoch. All other epochs use the same mean, so that the average image does not change during training.
Use an augmented image datastore for efficient preprocessing of images for deep learning, including image resizing. Do not use theReadFcn option of ImageDatastore objects. ImageDatastore allows batch reading of JPG or PNG image files using prefetching. If you set the ReadFcn option to a custom function, then ImageDatastore does not prefetch and is usually significantly slower.

By default, an augmentedImageDatastore only resizes images to fit the output size. You can configure options for additional image transformations using animageDataAugmenter.

Creation

Syntax

Description

auimds = augmentedImageDatastore([outputSize](#mw%5F28733dce-0e89-480c-b4d8-7ad0ca8fdd27),[imds](#mw%5Fb2ab8e2e-ead7-4682-8392-110a389ee06c)) creates an augmented image datastore for classification problems using images from image datastore imds. The datastore resizes images to the height and width specified by outputSize.

auimds = augmentedImageDatastore([outputSize](#mw%5F28733dce-0e89-480c-b4d8-7ad0ca8fdd27),[X](#mw%5Fe3105353-662c-4e21-8caa-5eae17d2e955),[Y](#d126e18094)) creates an augmented image datastore for classification and regression problems. The array X contains the predictor variables and the arrayY contains the categorical labels or numeric responses.

auimds = augmentedImageDatastore([outputSize](#mw%5F28733dce-0e89-480c-b4d8-7ad0ca8fdd27),[X](#mw%5Fe3105353-662c-4e21-8caa-5eae17d2e955)) creates an augmented image datastore for predicting responses of image data in array X.

auimds = augmentedImageDatastore([outputSize](#mw%5F28733dce-0e89-480c-b4d8-7ad0ca8fdd27),[tbl](#mw%5F59974969-7bbc-46a4-8300-cbc042c72695)) creates an augmented image datastore for classification and regression problems. The table, tbl, contains predictors and responses.

auimds = augmentedImageDatastore([outputSize](#mw%5F28733dce-0e89-480c-b4d8-7ad0ca8fdd27),[tbl](#mw%5F59974969-7bbc-46a4-8300-cbc042c72695),[responseNames](#mw%5F5553032b-8939-416b-bfc2-eeb85852b10c)) creates an augmented image datastore for classification and regression problems. The table, tbl, contains predictors and responses. TheresponseNames argument specifies the response variables in tbl.

auimds = augmentedImageDatastore(___,[Name=Value](#namevaluepairarguments)) also sets writable properties using name-value arguments. For example,augmentedImageDatastore([28,28],imds,OutputSizeMode="centercrop") creates an augmented image datastore that crops images from the center.

example

Input Arguments

expand all

Size of output images, specified as a vector of two positive integers. The first element specifies the height (number of rows) in the output images, and the second element specifies the width (number of columns).

The output images can have a third dimension that represents the color channels. However, if you specify outputSize as a three-element vector, then the datastore ignores the third element. Instead, the datastore determines the image size in the third dimension in one of these ways:

For input grayscale and RGB images, which have 1 or 3 color channels, the number of output color channels depends on the value of ColorPreprocessing. For example, if you specify outputSize as [28 28 1] but setColorPreprocessing as"gray2rgb", then the output images have size 28-by-28-by-3.
When the input images do not have 1 or 3 color channels, such as for multispectral or hyperspectral images, then the output images have the same number of color channels as the input images.

This argument sets the OutputSize property.

Images, specified as a 4-D numeric array. The first three dimensions are the height, width, and channels, and the last dimension indexes the individual images.

Responses for classification or regression, specified as one of the following:

For a classification problem, Y is a categorical vector containing the image labels.
For a regression problem, Y can be an:
- _n_-by-r numeric matrix. n is the number of observations and r is the number of responses.
- _h_-by-_w_-by-_c_-by-n numeric array._h_-by-w_-by-c is the size of a single response and_n is the number of observations.

Responses must not contain NaNs.

Data Types: categorical | double

Input data, specified as a table. tbl must contain the predictors in the first column as either absolute or relative image paths or images. The type and location of the responses depend on the problem:

For a classification problem, the response must be a categorical variable containing labels for the images. If the name of the response variable is not specified in the call toaugmentedImageDatastore, the responses must be in the second column. If the responses are in a different column oftbl, then you must specify the response variable name using the responseNames argument.
For a regression problem, the responses must be numerical values in the column or columns after the first column. The responses can be either in multiple columns as scalars or in a single column as numeric vectors or cell arrays containing numeric 3-D arrays. When you do not specify the name of the response variable or variables, augmentedImageDatastore accepts the remaining columns of tbl as the response variables. You can specify the response variable names using the responseNames argument.

Responses must not contain NaN values. If there areNaNs in the predictor data, they are propagated through the training, however, in most cases the training fails to converge.

Data Types: table

Names of the response variables in the input table, specified as one of the following:

For classification or regression tasks with a single response, responseNames must be a character vector or string scalar containing the response variable in the input table.
For regression tasks with multiple responses,responseNames must be a string array or cell array of character vectors containing the response variables in the input table.

Data Types: char | cell | string

Name-Value Arguments

expand all

Specify optional pairs of arguments asName1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter.

Example: auimds = augmentedImageDatastore([28,28],imds,OutputSizeMode="centercrop") creates an augmented image datastore that crops images from the center.

Preprocessing color operations performed on input grayscale or RGB images, specified as "none","gray2rgb", or "rgb2gray". When the image datastore contains a mixture of grayscale and RGB images, use ColorPreprocessing to ensure that all output images have the number of channels required by imageInputLayer.

Note

The augmentedImageDatastore object converts RGB images to grayscale by using the rgb2gray function. If an image has three channels that do not correspond to red, green, and blue channels (such as an image in the L*a*b* color space), then usingColorPreprocessing can give poor results.

The datastore does not perform color preprocessing when:

An input image already has the required number of color channels. For example, if you specify the value"gray2rgb" and an input image already has three channels, then no color preprocessing occurs.
The input images do not have 1 or 3 channels, such as for multispectral or hyperspectral images. In this case, all input images must have the same number of channels.

This argument sets the ColorPreprocessing property.

Data Types: char | string

Preprocessing applied to input images, specified as an imageDataAugmenter object or"none". WhenDataAugmentation is"none", the datastore only resizes images to fit the output size, and does not perform additional preprocessing.

This argument sets the DataAugmentation property.

Dispatch observations in the background during training, prediction, or classification, specified as false or true. To use background dispatching, you must have Parallel Computing Toolbox™.

Augmented image datastores only perform background dispatching when used with the trainnet function, and inference functions such aspredict and minibatchpredict. Background dispatching does not occur when you call the read function of the datastore directly.

This argument sets the DispatchInBackground property.

Method used to resize output images, specified as one of the following.

"resize" — Scale the image using bilinear interpolation to fit the output size.
Note
augmentedImageDatastore uses the bilinear interpolation method of imresize with antialiasing. Bilinear interpolation enables fast image processing while avoiding distortions such as caused by nearest-neighbor interpolation. In contrast, by default imresize uses bicubic interpolation with antialiasing to produce a high-quality resized image at the cost of longer processing time.
"centercrop" — Take a crop from the center of the training image. The crop has the same size as the output size.
"randcrop" — Take a random crop from the training image. The random crop has the same size as the output size.

This argument sets the OutputSizeMode property.

Data Types: char | string

Properties

expand all

Preprocessing color operations performed on input grayscale or RGB images, specified as "none", "gray2rgb", or"rgb2gray". When the image datastore contains a mixture of grayscale and RGB images, useColorPreprocessing to ensure that all output images have the number of channels required by imageInputLayer.

Note

The datastore does not perform color preprocessing when:

An input image already has the required number of color channels. For example, if you specify the value"gray2rgb" and an input image already has three channels, then no color preprocessing occurs.
The input images do not have 1 or 3 channels, such as for multispectral or hyperspectral images. In this case, all input images must have the same number of channels.

Data Types: char | string

Preprocessing applied to input images, specified as an imageDataAugmenter object or "none". WhenDataAugmentation is "none", the datastore only resizes images to fit the output size, and does not perform additional preprocessing.

Dispatch observations in the background during training, prediction, or classification, specified as false ortrue. To use background dispatching, you must have Parallel Computing Toolbox.

Augmented image datastores only perform background dispatching when used with the trainnet function, and inference functions such as predict andminibatchpredict. Background dispatching does not occur when you call the read function of the datastore directly.

Number of observations that are returned in each batch. You can change the value of MiniBatchSize only after you create the datastore.

Training and prediction functions that specify a mini-batch size, such astrainingOptions, minibatchpredict, and testnet, do not set the MiniBatchSize property. For best performance, use the same mini-batch size for your datastore as for your training and prediction functions.

This property is read-only.

Total number of observations in the augmented image datastore, returned as a positive integer. The number of observations is the length of one training epoch.

The OutputSize property does not indicate the number of color channels of the output images. When you read from the datastore, the output images can have a third dimension that represents the color channels.

For input grayscale and RGB images, which have 1 or 3 color channels, the number of output channels depends on the value ofColorPreprocessing. For example, whenColorPreprocessing is"gray2rgb", then the output size in the third dimension is 3. WhenColorPreprocessing is"rgb2gray", then the output images do not have a third dimension.
When the input images do not have 1 or 3 color channels, such as for multispectral or hyperspectral images, the output size in the third dimension is equal to the number of color channels of the input images.

Method used to resize output images, specified as one of the following.

"resize" — Scale the image using bilinear interpolation to fit the output size.
Note
augmentedImageDatastore uses the bilinear interpolation method of imresize with antialiasing. Bilinear interpolation enables fast image processing while avoiding distortions such as caused by nearest-neighbor interpolation. In contrast, by defaultimresize uses bicubic interpolation with antialiasing to produce a high-quality resized image at the cost of longer processing time.
"centercrop" — Take a crop from the center of the training image. The crop has the same size as the output size.
"randcrop" — Take a random crop from the training image. The random crop has the same size as the output size.

Data Types: char | string

Object Functions

combine	Combine data from multiple datastores
hasdata	Determine if data is available to read
numpartitions	Number of datastore partitions
partition	Partition a datastore
partitionByIndex	Partition augmentedImageDatastore according to indices
preview	Preview subset of data in datastore
read	Read data from augmentedImageDatastore
readall	Read all data in datastore
readByIndex	Read data specified by index fromaugmentedImageDatastore
reset	Reset datastore to initial state
shuffle	Shuffle data in augmentedImageDatastore
subset	Create subset of datastore or FileSet
transform	Transform datastore
isPartitionable	Determine whether datastore is partitionable
isShuffleable	Determine whether datastore is shuffleable

Examples

collapse all

Train a convolutional neural network using augmented image data. Data augmentation helps prevent the network from overfitting and memorizing the exact details of the training images.

Load the sample data, which consists of synthetic images of handwritten digits. XTrain is a 28-by-28-by-1-by-5000 array, where:

28 is the height and width of the images.
1 is the number of channels.
5000 is the number of synthetic images of handwritten digits.

labelsTrain is a categorical vector containing the labels for each observation.

Set aside 1000 of the images for network validation.

idx = randperm(size(XTrain,4),1000); XValidation = XTrain(:,:,:,idx); XTrain(:,:,:,idx) = []; TValidation = labelsTrain(idx); labelsTrain(idx) = [];

Create an imageDataAugmenter object that specifies preprocessing options for image augmentation, such as resizing, rotation, translation, and reflection. Randomly translate the images up to three pixels horizontally and vertically, and rotate the images with an angle up to 20 degrees.

imageAugmenter = imageDataAugmenter( ... 'RandRotation',[-20,20], ... 'RandXTranslation',[-3 3], ... 'RandYTranslation',[-3 3])

imageAugmenter = imageDataAugmenter with properties:

       FillValue: 0
 RandXReflection: 0
 RandYReflection: 0
    RandRotation: [-20 20]
       RandScale: [1 1]
      RandXScale: [1 1]
      RandYScale: [1 1]
      RandXShear: [0 0]
      RandYShear: [0 0]
RandXTranslation: [-3 3]
RandYTranslation: [-3 3]

Create an augmentedImageDatastore object to use for network training and specify the image output size. During training, the datastore performs image augmentation and resizes the images. The datastore augments the images without saving any images to memory. trainnet updates the network parameters and then discards the augmented images.

imageSize = [28 28 1]; augimds = augmentedImageDatastore(imageSize,XTrain,labelsTrain,'DataAugmentation',imageAugmenter);

Specify the convolutional neural network architecture.

layers = [ imageInputLayer(imageSize)

convolution2dLayer(3,8,'Padding','same')
batchNormalizationLayer
reluLayer   

maxPooling2dLayer(2,'Stride',2)

convolution2dLayer(3,16,'Padding','same')
batchNormalizationLayer
reluLayer   

maxPooling2dLayer(2,'Stride',2)

convolution2dLayer(3,32,'Padding','same')
batchNormalizationLayer
reluLayer   

fullyConnectedLayer(10)
softmaxLayer];

Specify the training options. Choosing among the options requires empirical analysis. To explore different training option configurations by running experiments, you can use the Experiment Manager app.

opts = trainingOptions('sgdm', ... 'MaxEpochs',15, ... 'Shuffle','every-epoch', ... 'Plots','training-progress', ... 'Metrics','accuracy', ... 'Verbose',false, ... 'ValidationData',{XValidation,TValidation});

Train the neural network using the trainnet function. For classification, use cross-entropy loss. By default, the trainnet function uses a GPU if one is available. Training on a GPU requires a Parallel Computing Toolbox™ license and a supported GPU device. For information on supported devices, see GPU Computing Requirements (Parallel Computing Toolbox). Otherwise, the trainnet function uses the CPU. To specify the execution environment, use the ExecutionEnvironment training option.

net = trainnet(augimds,layers,"crossentropy",opts);

Tips

You can visualize many transformed images in the same figure by using theimtile function. For example, this code displays one mini-batch of transformed images from an augmented image datastore calledauimds.
minibatch = read(auimds);
imshow(imtile(minibatch.input))
By default, resizing is the only image preprocessing operation performed on images. Enable additional preprocessing operations by using the DataAugmentation name-value argument with an imageDataAugmenter object. Each time images are read from the augmented image datastore, a different random combination of preprocessing operations are applied to each image.

Version History

Introduced in R2018a