yolov2TransformLayer - Create transform layer for YOLO v2 object detection network - MATLAB (original) (raw)
Create transform layer for YOLO v2 object detection network
Description
The yolov2TransformLayer
function creates aYOLOv2TransformLayer
object, which represents the transform layer for you only look once version 2 (YOLO v2) object detection network. The transform layer in YOLO v2 object detection network improves the stability of the network by constraining the location predictions. The transform layer extracts activations of the last convolutional layer and transforms the bounding box predictions to fall within the bounds of the ground truth.
Creation
Syntax
Description
`layer` = yolov2TransformLayer([numAnchorBoxes](#mw%5Fae5f27b4-c000-42b7-8cba-a17aa36307f8))
creates the transform layer for YOLO v2 object detection network.
`layer` = yolov2TransformLayer([numAnchorBoxes](#mw%5Fae5f27b4-c000-42b7-8cba-a17aa36307f8),`Name,Value`)
sets the Name
property using a name-value pair. Enclose the property name in single quotes. For example,yolov2TransformLayer('Name','yolo_Transform')
creates a transform layer with the name 'yolo_Transform'.
Input Arguments
Number of anchor boxes used for training, specified as a positive integer. This input sets the NumAnchorBoxes
property of the transform layer.
Data Types: single
| double
| int8
| int16
| int32
| int64
| uint8
| uint16
| uint32
| uint64
Properties
Data Types: char
| string
This property is read-only.
Number of anchor boxes used for training, specified as a positive integer. This property is set by the input numAnchorBoxes
.
This property is read-only.
Number of inputs to the layer, stored as 1
. This layer accepts a single input only.
Data Types: double
This property is read-only.
Input names, stored as {'in'}
. This layer accepts a single input only.
Data Types: cell
This property is read-only.
Number of outputs from the layer, stored as 1
. This layer has a single output only.
Data Types: double
This property is read-only.
Output names, stored as {'out'}
. This layer has a single output only.
Data Types: cell
Examples
Specify the number of anchor boxes.
Create a YOLO v2 transform layer with the name "yolo_Transform".
layer = yolov2TransformLayer(numAnchorBoxes,'Name','yolo_Transform');
Inspect the properties of the YOLO v2 transform layer.
layer = YOLOv2TransformLayer with properties:
Name: 'yolo_Transform'
NumAnchorBoxes: 5
Learnable Parameters No properties.
State Parameters No properties.
Show all properties
Algorithms
Layers in a layer array or layer graph pass data to subsequent layers as formatted dlarray (Deep Learning Toolbox) objects. The format of a dlarray
object is a string of characters in which each character describes the corresponding dimension of the data. The format consists of one or more of these characters:
"S"
— Spatial"C"
— Channel"B"
— Batch"T"
— Time"U"
— Unspecified
For example, you can describe 2-D image data that is represented as a 4-D array, where the first two dimensions correspond to the spatial dimensions of the images, the third dimension corresponds to the channels of the images, and the fourth dimension corresponds to the batch dimension, as having the format "SSCB"
(spatial, spatial, channel, batch).
You can interact with these dlarray
objects in automatic differentiation workflows, such as those for developing a custom layer, using a functionLayer (Deep Learning Toolbox) object, or using the forward (Deep Learning Toolbox) and predict (Deep Learning Toolbox) functions withdlnetwork
objects.
This table shows the supported input formats of yolov2TransformLayer
objects and the corresponding output format. If the software passes the output of the layer to a custom layer that does not inherit from the nnet.layer.Formattable
class, or aFunctionLayer
object with the Formattable
property set to 0
(false
), then the layer receives an unformatted dlarray
object with dimensions ordered according to the formats in this table. The formats listed here are only a subset. The layer may support additional formats such as formats with additional "S"
(spatial) or"U"
(unspecified) dimensions.
Input Format | Output Format |
---|---|
"SSCB" (spatial, spatial, channel, batch) | "SSCB" (spatial, spatial, channel, batch) |
References
[1] Joseph. R, S. K. Divvala, R. B. Girshick, and F. Ali. "You Only Look Once: Unified, Real-Time Object Detection." In_Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)_, pp. 779–788. Las Vegas, NV: CVPR, 2016.
[2] Joseph. R and F. Ali. "YOLO 9000: Better, Faster, Stronger." In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 6517–6525. Honolulu, HI: CVPR, 2017.
Extended Capabilities
To generate CUDA® or C++ code by using GPU Coder™, you must first construct and train a deep neural network. Once the network is trained and evaluated, you can configure the code generator to generate code and deploy the convolutional neural network on platforms that use NVIDIA® or ARM® GPU processors. For more information, see Deep Learning with GPU Coder (GPU Coder).
Version History
Introduced in R2019a
yolov2TransformLayer
can accept formatted dlarray
data with the format "SSCB"
(spatial, spatial, channel, batch). When you pass formatteddlarray
data to the layer, the layer returns data of the same format. For more information, see Layer Input and Output Formats.