yolov2TransformLayer - Create transform layer for YOLO v2 object detection network - MATLAB (original) (raw)

Create transform layer for YOLO v2 object detection network

Description

The yolov2TransformLayer function creates aYOLOv2TransformLayer object, which represents the transform layer for you only look once version 2 (YOLO v2) object detection network. The transform layer in YOLO v2 object detection network improves the stability of the network by constraining the location predictions. The transform layer extracts activations of the last convolutional layer and transforms the bounding box predictions to fall within the bounds of the ground truth.

Creation

Syntax

Description

`layer` = yolov2TransformLayer([numAnchorBoxes](#mw%5Fae5f27b4-c000-42b7-8cba-a17aa36307f8)) creates the transform layer for YOLO v2 object detection network.

example

`layer` = yolov2TransformLayer([numAnchorBoxes](#mw%5Fae5f27b4-c000-42b7-8cba-a17aa36307f8),`Name,Value`) sets the Name property using a name-value pair. Enclose the property name in single quotes. For example,yolov2TransformLayer('Name','yolo_Transform') creates a transform layer with the name 'yolo_Transform'.

example

Input Arguments

expand all

Number of anchor boxes used for training, specified as a positive integer. This input sets the NumAnchorBoxes property of the transform layer.

Data Types: single | double | int8 | int16 | int32 | int64 | uint8 | uint16 | uint32 | uint64

Properties

expand all

Data Types: char | string

This property is read-only.

Number of anchor boxes used for training, specified as a positive integer. This property is set by the input numAnchorBoxes.

This property is read-only.

Number of inputs to the layer, stored as 1. This layer accepts a single input only.

Data Types: double

This property is read-only.

Input names, stored as {'in'}. This layer accepts a single input only.

Data Types: cell

This property is read-only.

Number of outputs from the layer, stored as 1. This layer has a single output only.

Data Types: double

This property is read-only.

Output names, stored as {'out'}. This layer has a single output only.

Data Types: cell

Examples

collapse all

Specify the number of anchor boxes.

Create a YOLO v2 transform layer with the name "yolo_Transform".

layer = yolov2TransformLayer(numAnchorBoxes,'Name','yolo_Transform');

Inspect the properties of the YOLO v2 transform layer.

layer = YOLOv2TransformLayer with properties:

          Name: 'yolo_Transform'
NumAnchorBoxes: 5

Learnable Parameters No properties.

State Parameters No properties.

Show all properties

Algorithms

expand all

Layers in a layer array or layer graph pass data to subsequent layers as formatted dlarray (Deep Learning Toolbox) objects. The format of a dlarray object is a string of characters in which each character describes the corresponding dimension of the data. The format consists of one or more of these characters:

For example, you can describe 2-D image data that is represented as a 4-D array, where the first two dimensions correspond to the spatial dimensions of the images, the third dimension corresponds to the channels of the images, and the fourth dimension corresponds to the batch dimension, as having the format "SSCB" (spatial, spatial, channel, batch).

You can interact with these dlarray objects in automatic differentiation workflows, such as those for developing a custom layer, using a functionLayer (Deep Learning Toolbox) object, or using the forward (Deep Learning Toolbox) and predict (Deep Learning Toolbox) functions withdlnetwork objects.

This table shows the supported input formats of yolov2TransformLayer objects and the corresponding output format. If the software passes the output of the layer to a custom layer that does not inherit from the nnet.layer.Formattable class, or aFunctionLayer object with the Formattable property set to 0 (false), then the layer receives an unformatted dlarray object with dimensions ordered according to the formats in this table. The formats listed here are only a subset. The layer may support additional formats such as formats with additional "S" (spatial) or"U" (unspecified) dimensions.

Input Format Output Format
"SSCB" (spatial, spatial, channel, batch) "SSCB" (spatial, spatial, channel, batch)

References

[1] Joseph. R, S. K. Divvala, R. B. Girshick, and F. Ali. "You Only Look Once: Unified, Real-Time Object Detection." In_Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)_, pp. 779–788. Las Vegas, NV: CVPR, 2016.

[2] Joseph. R and F. Ali. "YOLO 9000: Better, Faster, Stronger." In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 6517–6525. Honolulu, HI: CVPR, 2017.

Extended Capabilities

expand all

To generate CUDA® or C++ code by using GPU Coder™, you must first construct and train a deep neural network. Once the network is trained and evaluated, you can configure the code generator to generate code and deploy the convolutional neural network on platforms that use NVIDIA® or ARM® GPU processors. For more information, see Deep Learning with GPU Coder (GPU Coder).

Version History

Introduced in R2019a

expand all

yolov2TransformLayer can accept formatted dlarray data with the format "SSCB" (spatial, spatial, channel, batch). When you pass formatteddlarray data to the layer, the layer returns data of the same format. For more information, see Layer Input and Output Formats.