ImageInputLayer - Image input layer - MATLAB (original) (raw)
Description
An image input layer inputs 2-D images to a neural network and applies data normalization.
For 3-D image input, use image3dInputLayer.
Creation
Syntax
Description
`layer` = imageInputLayer([inputSize](#mw%5F342fa7c6-d7c0-456b-bfa5-366256fe67c9))
returns an image input layer and specifies the InputSize property.
`layer` = imageInputLayer([inputSize](#mw%5F342fa7c6-d7c0-456b-bfa5-366256fe67c9),[Name=Value](#namevaluepairarguments))
sets optional properties using one or more name-value arguments.
Input Arguments
inputSize
— Size of the input
row vector of integers
Size of the input data, specified as a row vector of integers[h w c]
, where h
,w
, and c
correspond to the height, width, and number of channels respectively.
- For grayscale images, specify a vector with
c
equal to1
. - For RGB images, specify a vector with
c
equal to3
. - For multispectral or hyperspectral images, specify a vector with
c
equal to the number of channels.
For 3-D image or volume input, use image3dInputLayer.
Name-Value Arguments
Specify optional pairs of arguments asName1=Value1,...,NameN=ValueN
, where Name
is the argument name and Value
is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter.
Before R2021a, use commas to separate each name and value, and enclose Name
in quotes.
Example: imageInputLayer([28 28 3],Name="input")
creates an image input layer with input size [28 28 3]
and name'input'
.
Normalization
— Data normalization
"zerocenter"
(default) | "zscore"
| "rescale-symmetric"
| "rescale-zero-one"
| "none"
| function handle
Data normalization to apply every time data is forward propagated through the input layer, specified as one of the following:
"zerocenter"
— Subtract the mean specified by Mean."zscore"
— Subtract the mean specified by Mean and divide byStandardDeviation."rescale-symmetric"
— Rescale the input to be in the range [-1, 1] using the minimum and maximum values specified by Min and Max, respectively."rescale-zero-one"
— Rescale the input to be in the range [0, 1] using the minimum and maximum values specified by Min and Max, respectively."none"
— Do not normalize the input data.- function handle — Normalize the data using the specified function. The function must be of the form
Y = f(X)
, whereX
is the input data and the outputY
is the normalized data.
If the input data is complex-valued and the SplitComplexInputs
option is 0
(false
), then the Normalization
option must be"zerocenter"
,"zscore"
, "none"
, or a function handle. (since R2024a)
Before R2024a: To input complex-valued data into the network, theSplitComplexInputs
option must be1
(true
).
Tip
The software, by default, automatically calculates the normalization statistics when you use the trainnet function. To save time when training, specify the required statistics for normalization and set the ResetInputNormalization option in trainingOptions to 0
(false
).
The ImageInputLayer
object stores theNormalization
property as a character vector or a function handle.
NormalizationDimension
— Normalization dimension
"auto"
(default) | "channel"
| "element"
| "all"
Normalization dimension, specified as one of the following:
"auto"
– If theResetInputNormalization
training option is0
(false
) and you specify any of the normalization statistics (Mean
,StandardDeviation
,Min
, orMax
), then normalize over the dimensions matching the statistics. Otherwise, recalculate the statistics at training time and apply channel-wise normalization."channel"
– Channel-wise normalization."element"
– Element-wise normalization."all"
– Normalize all values using scalar statistics.
The ImageInputLayer
object stores theNormalizationDimension
property as a character vector.
Mean
— Mean for zero-center and z-score normalization
[]
(default) | 3-D array | numeric scalar
Mean for zero-center and z-score normalization, specified as a_h_-by-w_-by-c array, a 1-by-1-by-c array of means per channel, a numeric scalar, or []
, where_h, w, and_c_ correspond to the height, width, and the number of channels of the mean, respectively.
To specify the Mean
property, the Normalization
property must be "zerocenter"
or "zscore"
. If Mean
is []
, then the software automatically sets the property at training or initialization time:
- The
trainnet
function calculates the mean using the training data and uses the resulting value. - The
initialize
function and thedlnetwork
function when theInitialize
option is1
(true
) sets the property to0
.
Mean
can be complex-valued. (since R2024a) If Mean
is complex-valued, then the SplitComplexInputs
option must be 0
(false
).
Before R2024a: Split the mean into real and imaginary parts and set split the input data into real and imaginary parts by setting the SplitComplexInputs
option to 1
(true
).
Data Types: single
| double
| int8
| int16
| int32
| int64
| uint8
| uint16
| uint32
| uint64
Complex Number Support: Yes
StandardDeviation
— Standard deviation for z-score normalization
[]
(default) | 3-D array | numeric scalar
Standard deviation for z-score normalization, specified as a_h_-by-w_-by-c array, a 1-by-1-by-c array of means per channel, a numeric scalar, or []
, where_h, w, and_c_ correspond to the height, width, and the number of channels of the standard deviation, respectively.
To specify the StandardDeviation
property, the Normalization
property must be"zscore"
. If StandardDeviation
is []
, then the software automatically sets the property at training or initialization time:
- The
trainnet
function calculates the standard deviation using the training data and uses the resulting value. - The
initialize
function and thedlnetwork
function when theInitialize
option is1
(true
) sets the property to1
.
StandardDeviation
can be complex-valued. (since R2024a) IfStandardDeviation
is complex-valued, then theSplitComplexInputs
option must be0
(false
).
Before R2024a: Split the standard deviation into real and imaginary parts and set split the input data into real and imaginary parts by setting theSplitComplexInputs
option to1
(true
).
Data Types: single
| double
| int8
| int16
| int32
| int64
| uint8
| uint16
| uint32
| uint64
Complex Number Support: Yes
Min
— Minimum value for rescaling
[]
(default) | 3-D array | numeric scalar
Minimum value for rescaling, specified as a_h_-by-w_-by-c array, a 1-by-1-by-c array of minima per channel, a numeric scalar, or []
, where_h, w, and_c_ correspond to the height, width, and the number of channels of the minima, respectively.
To specify the Min
property, the Normalization
must be"rescale-symmetric"
or"rescale-zero-one"
. If Min
is []
, then the software automatically sets the property at training or initialization time:
- The
trainnet
function calculates the minimum value using the training data and uses the resulting value. - The
initialize
function and thedlnetwork
function when theInitialize
option is1
(true
) sets the property to-1
and0
whenNormalization
is"rescale-symmetric"
and"rescale-zero-one"
, respectively.
Data Types: single
| double
| int8
| int16
| int32
| int64
| uint8
| uint16
| uint32
| uint64
Max
— Maximum value for rescaling
[]
(default) | 3-D array | numeric scalar
Maximum value for rescaling, specified as a_h_-by-w_-by-c array, a 1-by-1-by-c array of maxima per channel, a numeric scalar, or []
, where_h, w, and_c_ correspond to the height, width, and the number of channels of the maxima, respectively.
To specify the Max
property, the Normalization
must be"rescale-symmetric"
or"rescale-zero-one"
. If Max
is []
, then the software automatically sets the property at training or initialization time:
- The
trainnet
function calculates the maximum value using the training data and uses the resulting value. - The
initialize
function and thedlnetwork
function when theInitialize
option is1
(true
) sets the property to1
.
Data Types: single
| double
| int8
| int16
| int32
| int64
| uint8
| uint16
| uint32
| uint64
SplitComplexInputs
— Flag to split input data into real and imaginary components
0
(false
) (default) | 1
(true
)
Flag to split input data into real and imaginary components specified as one of these values:
0
(false
) – Do not split input data.1
(true
) – Split data into real and imaginary components.
When SplitComplexInputs
is1
, then the layer outputs twice as many channels as the input data. For example, if the input data is complex-valued with numChannels
channels, then the layer outputs data with 2*numChannels
channels, where channels 1
throughnumChannels
contain the real components of the input data and numChannels+1
through2*numChannels
contain the imaginary components of the input data. If the input data is real, then channels numChannels+1
through2*numChannels
are all zero.
If the input data is complex-valued and SplitComplexInputs
is0
(false
), then the layer passes the complex-valued data to the next layers. (since R2024a)
Before R2024a: To input complex-valued data into a neural network, theSplitComplexInputs
option of the input layer must be 1
(true
).
For an example showing how to train a network with complex-valued data, see Train Network with Complex-Valued Data.
Name
— Layer name
""
(default) | character vector | string scalar
Layer name, specified as a character vector or a string scalar. For Layer
array input, the trainnet and dlnetwork functions automatically assign names to layers with the name ""
.
The ImageInputLayer
object stores the Name
property as a character vector.
Data Types: char
| string
Properties
Image Input
InputSize
— Size of the input
row vector of integers
This property is read-only.
Size of the input data, specified as a row vector of integers[h w c]
, where h
,w
, and c
correspond to the height, width, and number of channels respectively.
- For grayscale images, specify a vector with
c
equal to1
. - For RGB images, specify a vector with
c
equal to3
. - For multispectral or hyperspectral images, specify a vector with
c
equal to the number of channels.
For 3-D image or volume input, use image3dInputLayer.
Normalization
— Data normalization
"zerocenter"
(default) | "zscore"
| "rescale-symmetric"
| "rescale-zero-one"
| "none"
| function handle
This property is read-only.
Data normalization to apply every time data is forward propagated through the input layer, specified as one of the following:
"zerocenter"
— Subtract the mean specified by Mean."zscore"
— Subtract the mean specified by Mean and divide by StandardDeviation."rescale-symmetric"
— Rescale the input to be in the range [-1, 1] using the minimum and maximum values specified by Min and Max, respectively."rescale-zero-one"
— Rescale the input to be in the range [0, 1] using the minimum and maximum values specified by Min and Max, respectively."none"
— Do not normalize the input data.- function handle — Normalize the data using the specified function. The function must be of the form
Y = f(X)
, whereX
is the input data and the outputY
is the normalized data.
If the input data is complex-valued and theSplitComplexInputs
option is 0
(false
), then the Normalization
option must be"zerocenter"
, "zscore"
,"none"
, or a function handle. (since R2024a)
Before R2024a: To input complex-valued data into the network, the SplitComplexInputs
option must be 1
(true
).
Tip
The software, by default, automatically calculates the normalization statistics when you use the trainnet function. To save time when training, specify the required statistics for normalization and set the ResetInputNormalization option in trainingOptions to 0
(false
).
The ImageInputLayer
object stores this property as a character vector or a function handle.
NormalizationDimension
— Normalization dimension
"auto"
(default) | "channel"
| "element"
| "all"
Normalization dimension, specified as one of the following:
"auto"
– If theResetInputNormalization
training option is0
(false
) and you specify any of the normalization statistics (Mean
,StandardDeviation
,Min
, orMax
), then normalize over the dimensions matching the statistics. Otherwise, recalculate the statistics at training time and apply channel-wise normalization."channel"
– Channel-wise normalization."element"
– Element-wise normalization."all"
– Normalize all values using scalar statistics.
The ImageInputLayer
object stores this property as a character vector.
Mean
— Mean for zero-center and z-score normalization
[]
(default) | 3-D array | numeric scalar
Mean for zero-center and z-score normalization, specified as a_h_-by-_w_-by-c array, a 1-by-1-by-c array of means per channel, a numeric scalar, or []
, where h,w, and c correspond to the height, width, and the number of channels of the mean, respectively.
To specify the Mean
property, the Normalization
property must be "zerocenter"
or "zscore"
. If Mean
is[]
, then the software automatically sets the property at training or initialization time:
- The
trainnet
function calculates the mean using the training data and uses the resulting value. - The
initialize
function and thedlnetwork
function when theInitialize
option is1
(true
) sets the property to0
.
Mean
can be complex-valued. (since R2024a) IfMean
is complex-valued, then theSplitComplexInputs
option must be 0
(false
).
Before R2024a: Split the mean into real and imaginary parts and split the input data into real and imaginary parts by setting theSplitComplexInputs
option to1
(true
).
Data Types: single
| double
| int8
| int16
| int32
| int64
| uint8
| uint16
| uint32
| uint64
Complex Number Support: Yes
StandardDeviation
— Standard deviation for z-score normalization
[]
(default) | 3-D array | numeric scalar
Standard deviation for z-score normalization, specified as a_h_-by-_w_-by-c array, a 1-by-1-by-c array of means per channel, a numeric scalar, or []
, where h,w, and c correspond to the height, width, and the number of channels of the standard deviation, respectively.
To specify the StandardDeviation
property, theNormalization
property must be"zscore"
. If StandardDeviation
is[]
, then the software automatically sets the property at training or initialization time:
- The
trainnet
function calculates the standard deviation using the training data and uses the resulting value. - The
initialize
function and thedlnetwork
function when theInitialize
option is1
(true
) sets the property to1
.
StandardDeviation
can be complex-valued. (since R2024a) If StandardDeviation
is complex-valued, then the SplitComplexInputs
option must be 0
(false
).
Before R2024a: Split the standard deviation into real and imaginary parts and split the input data into real and imaginary parts by setting theSplitComplexInputs
option to 1
(true
).
Data Types: single
| double
| int8
| int16
| int32
| int64
| uint8
| uint16
| uint32
| uint64
Complex Number Support: Yes
Min
— Minimum value for rescaling
[]
(default) | 3-D array | numeric scalar
Minimum value for rescaling, specified as a_h_-by-_w_-by-c array, a 1-by-1-by-c array of minima per channel, a numeric scalar, or []
, where h,w, and c correspond to the height, width, and the number of channels of the minima, respectively.
To specify the Min
property, the Normalization
must be "rescale-symmetric"
or"rescale-zero-one"
. If Min
is[]
, then the software automatically sets the property at training or initialization time:
- The
trainnet
function calculates the minimum value using the training data and uses the resulting value. - The
initialize
function and thedlnetwork
function when theInitialize
option is1
(true
) sets the property to-1
and0
whenNormalization
is"rescale-symmetric"
and"rescale-zero-one"
, respectively.
Data Types: single
| double
| int8
| int16
| int32
| int64
| uint8
| uint16
| uint32
| uint64
Max
— Maximum value for rescaling
[]
(default) | 3-D array | numeric scalar
Maximum value for rescaling, specified as a_h_-by-_w_-by-c array, a 1-by-1-by-c array of maxima per channel, a numeric scalar, or []
, where h,w, and c correspond to the height, width, and the number of channels of the maxima, respectively.
To specify the Max
property, the Normalization
must be "rescale-symmetric"
or"rescale-zero-one"
. If Max
is[]
, then the software automatically sets the property at training or initialization time:
- The
trainnet
function calculates the maximum value using the training data and uses the resulting value. - The
initialize
function and thedlnetwork
function when theInitialize
option is1
(true
) sets the property to1
.
Data Types: single
| double
| int8
| int16
| int32
| int64
| uint8
| uint16
| uint32
| uint64
SplitComplexInputs
— Flag to split input data into real and imaginary components
0
(false
) (default) | 1
(true
)
This property is read-only.
Flag to split input data into real and imaginary components specified as one of these values:
0
(false
) – Do not split input data.1
(true
) – Split data into real and imaginary components.
When SplitComplexInputs
is 1
, then the layer outputs twice as many channels as the input data. For example, if the input data is complex-valued with numChannels
channels, then the layer outputs data with 2*numChannels
channels, where channels 1
through numChannels
contain the real components of the input data andnumChannels+1
through 2*numChannels
contain the imaginary components of the input data. If the input data is real, then channelsnumChannels+1
through 2*numChannels
are all zero.
If the input data is complex-valued andSplitComplexInputs
is 0
(false
), then the layer passes the complex-valued data to the next layers. (since R2024a)
Before R2024a: To input complex-valued data into a neural network, the SplitComplexInputs
option of the input layer must be1
(true
).
For an example showing how to train a network with complex-valued data, see Train Network with Complex-Valued Data.
Layer
Layer name, specified as a character vector or string scalar. For Layer
array input, the trainnet anddlnetwork functions automatically assign names to layers with the name ""
.
The ImageInputLayer
object stores this property as a character vector.
Data Types: char
| string
NumInputs
— Number of inputs
0 (default)
This property is read-only.
Number of inputs of the layer. The layer has no inputs.
Data Types: double
InputNames
— Input names
{}
(default)
This property is read-only.
Input names of the layer. The layer has no inputs.
Data Types: cell
NumOutputs
— Number of outputs
1
(default)
This property is read-only.
Number of outputs from the layer, returned as 1
. This layer has a single output only.
Data Types: double
OutputNames
— Output names
{'out'}
(default)
This property is read-only.
Output names, returned as {'out'}
. This layer has a single output only.
Data Types: cell
Examples
Create Image Input Layer
Create an image input layer for 28-by-28 color images.
inputlayer = imageInputLayer([28 28 3])
inputlayer = ImageInputLayer with properties:
Name: ''
InputSize: [28 28 3]
SplitComplexInputs: 0
Hyperparameters DataAugmentation: 'none' Normalization: 'zerocenter' NormalizationDimension: 'auto' Mean: []
Include an image input layer in a Layer
array.
layers = [ imageInputLayer([28 28 1]) convolution2dLayer(5,20) reluLayer maxPooling2dLayer(2,Stride=2) fullyConnectedLayer(10) softmaxLayer]
layers = 6x1 Layer array with layers:
1 '' Image Input 28x28x1 images with 'zerocenter' normalization
2 '' 2-D Convolution 20 5x5 convolutions with stride [1 1] and padding [0 0 0 0]
3 '' ReLU ReLU
4 '' 2-D Max Pooling 2x2 max pooling with stride [2 2] and padding [0 0 0 0]
5 '' Fully Connected 10 fully connected layer
6 '' Softmax softmax
Algorithms
Layer Output Formats
Layers in a layer array or layer graph pass data to subsequent layers as formatted dlarray objects. The format of a dlarray
object is a string of characters in which each character describes the corresponding dimension of the data. The formats consist of one or more of these characters:
"S"
— Spatial"C"
— Channel"B"
— Batch"T"
— Time"U"
— Unspecified
For example, you can describe 2-D image data that is represented as a 4-D array, where the first two dimensions correspond to the spatial dimensions of the images, the third dimension corresponds to the channels of the images, and the fourth dimension corresponds to the batch dimension, as having the format "SSCB"
(spatial, spatial, channel, batch).
The input layer of a network specifies the layout of the data that the network expects. If you have data in a different layout, then specify the layout using the InputDataFormats
training option.
The layer inputs_h_-by-_w_-by-c_-by-N arrays into the network, where h, w, and_c are the height, width, and number of channels of the images, respectively, and N is the number of images. Data in this layout has the data format "SSCB"
(spatial, spatial, channel, batch).
Complex Numbers
For complex-valued input to the neural network, when the SplitComplexIputs
is 0
(false
), the layer passes complex-valued data to subsequent layers. (since R2024a)
Before R2024a: To input complex-valued data into a neural network, the SplitComplexInputs
option of the input layer must be 1
(true
).
If the input data is complex-valued and the SplitComplexInputs
option is 0
(false
), then the Normalization
option must be "zerocenter"
, "zscore"
, "none"
, or a function handle. The Mean
and StandardDeviation
properties of the layer also support complex-valued data for the "zerocenter"
and "zscore"
normalization options.
For an example showing how to train a network with complex-valued data, see Train Network with Complex-Valued Data.
References
[1] Krizhevsky, Alex, Ilya Sutskever, and Geoffrey E. Hinton. "ImageNet Classification with Deep Convolutional Neural Networks."Communications of the ACM 60, no. 6 (May 24, 2017): 84–90. https://doi.org/10.1145/3065386.
[2] Cireşan, D., U. Meier, J. Schmidhuber. "Multi-column Deep Neural Networks for Image Classification".IEEE Conference on Computer Vision and Pattern Recognition, 2012.
Extended Capabilities
C/C++ Code Generation
Generate C and C++ code using MATLAB® Coder™.
Usage notes and limitations:
- Code generation does not support passing
dlarray
objects with unspecified (U) dimensions to this layer. - Code generation does not support
Normalization
specified using a function handle. - Code generation does not support complex input and does not support the
SplitComplexInputs
option.
GPU Code Generation
Generate CUDA® code for NVIDIA® GPUs using GPU Coder™.
Refer to the usage notes and limitations in the C/C++ Code Generation section. The same limitations apply to GPU code generation.
Version History
Introduced in R2016a
R2024a: Complex-valued outputs
For complex-valued input to the neural network, when the SplitComplexIputs
is 0
(false
), the layer passes complex-valued data to subsequent layers.
If the input data is complex-valued and the SplitComplexInputs
option is0
(false
), then theNormalization
option must be "zerocenter"
,"zscore"
, "none"
, or a function handle. TheMean
and StandardDeviation
properties of the layer also support complex-valued data for the "zerocenter"
and"zscore"
normalization options.
R2019b: AverageImage
property will be removed
AverageImage
will be removed. Use Mean
instead. To update your code, replace all instances of AverageImage
with Mean
. There are no differences between the properties that require additional updates to your code.
R2019b: imageInputLayer
and image3dInputLayer
, by default, use channel-wise normalization
Starting in R2019b, imageInputLayer
and image3dInputLayer
, by default, use channel-wise normalization. In previous versions, these layers use element-wise normalization. To reproduce this behavior, set the NormalizationDimension option of these layers to'element'
.
R2018a: DataAugmentation
is not recommended
The DataAugmentation
property is not recommended. To preprocess images with cropping, reflection, and other geometric transformations, use augmentedImageDatastore instead.