Preprocess Images for Deep Learning - MATLAB & Simulink (original) (raw)

To train a network and make predictions on new data, your images must match the input size of the network. If you need to adjust the size of your images to match the network, then you can rescale or crop your data to the required size.

You can effectively increase the amount of training data by applying randomized_augmentation_ to your data. Augmentation also enables you to train networks to be invariant to distortions in image data. For example, you can add randomized rotations to input images so that a network is invariant to the presence of rotation in input images. An augmentedImageDatastore provides a convenient way to apply a limited set of augmentations to 2-D images for classification problems.

For more advanced preprocessing operations, to preprocess images for regression problems, or to preprocess 3-D volumetric images, you can start with a built-in datastore. You can also preprocess images according to your own pipeline by using the transform andcombine functions.

Resize Images Using Rescaling and Cropping

You can store image data as a numeric array, an ImageDatastore object, or a table. An ImageDatastore enables you to import data in batches from image collections that are too large to fit in memory. You can use an augmented image datastore or a resized 4-D array for training, prediction, and classification. You can use a resized 3-D array for prediction and classification only.

There are two ways to resize image data to match the input size of a network.

Resizing Option Data Format Resizing Function Sample Code
Rescaling 3-D array representing a single color or multispectral image3-D array representing a stack of grayscale images4-D array representing a stack of images imresize im = imresize(I,outputSize); outputSize specifies the dimensions of the rescaled image.
4-D array representing a stack of imagesImageDatastoretable augmentedImageDatastore auimds = augmentedImageDatastore(outputSize,I); outputSize specifies the dimensions of the rescaled image.
Cropping 3-D array representing a single color or multispectral image imcrop (Image Processing Toolbox) im = imcrop(I,rect); rect specifies the size and position of the 2-D cropping window.
3-D array representing a stack of grayscale images4-D array representing a stack of color or multispectral images imcrop3 (Image Processing Toolbox) im = imcrop3(I,cuboid); cuboid specifies the size and position of the 3-D cropping window.
4-D array representing a stack of imagesImageDatastoretable augmentedImageDatastore auimds = augmentedImageDatastore(outputSize,I,'OutputSizeMode',m); Specify m as"centercrop" to crop from the center of the input image.Specify m as"randcrop" to crop from a random location in the input image.

Augment Images for Training with Random Geometric Transformations

For image classification problems, you can use an augmentedImageDatastore to augment images with a random combination of resizing, rotation, reflection, shear, and translation transformations.

The diagram shows how trainnet uses an augmented image datastore to transform training data for each epoch. When you use data augmentation, one randomly augmented version of each image is used during each epoch of training. For an example of the workflow, see Retrain Neural Network to Classify New Images.

Diagram showing training process using augmentation. There is an augmented image datastore showing a collection of training images and an image data augmenter. There is a box with label "trainnet" with showing the images used for epoch 1, epoch 2, and so on. There are arrows from the datastore to each epoch Each epoch shows the same images, but with different transformations. To illustrate this detail, there is an icon labeled "transform mini-batches" on each arrow from the datastore to the epochs.

  1. Specify training images.
  2. Configure image transformation options, such as the range of rotation angles and whether to apply reflection at random, by creating an imageDataAugmenter.
    Tip
    To preview the transformations applied to sample images, use theaugment function.
  3. Create an augmentedImageDatastore. Specify the training images, the size of output images, and the imageDataAugmenter. The size of output images must be compatible with the size of the imageInputLayer of the network.
  4. Train the network, specifying the augmented image datastore as the data source for trainnet. For each iteration of training, the augmented image datastore applies a random combination of transformations to images in the mini-batch of training data.
    When you use an augmented image datastore as a source of training images, the datastore randomly perturbs the training data for each epoch, so that each epoch uses a slightly different data set. The actual number of training images at each epoch does not change. The transformed images are not stored in memory.

Perform Additional Image Processing Operations Using Built-In Datastores

Some datastores perform specific and limited image preprocessing operations when they read a batch of data. These application-specific datastores are listed in the table. You can use these datastores as a source of training, validation, and test data sets for deep learning applications that use Deep Learning Toolbox™. All of these datastores return image data in a format supported bytrainnet.

Apply Custom Image Processing Pipelines Using Combine and Transform

To perform more general and complex image preprocessing operations than offered by the application-specific datastores, you can use the transform and combine functions. For more information, see Datastores for Deep Learning.

Transform Datastores with Image Data

The transform function creates an altered form of a datastore, called an_underlying datastore_, by transforming the data read by the underlying datastore according to a transformation function that you define.

The custom transformation function must accept data in the format returned by theread function of the underlying datastore. For image data in an ImageDatastore, the format depends on theReadSize property.

The transform function must return data that matches the input size of the network. The transform function does not support one-to-many observation mappings.

Tip

The transform function supports prefetching when the underlying ImageDatastore reads a batch of JPG or PNG image files. For these image types, do not use the readFcn argument of ImageDatastore to apply image preprocessing, as this option is usually significantly slower. If you use a custom read function, then ImageDatastore does not prefetch.

Combine Datastores with Image Data

The combine function concatenates the data read from multiple datastores and maintains parity between the datastores.

See Also

trainnet | trainingOptions | dlnetwork | imresize | transform | combine | ImageDatastore

More About