yolov2OutputLayer - (To be removed) Create output layer for YOLO v2 object detection
network - MATLAB ([original](https://in.mathworks.com/help/vision/ref/nnet.cnn.layer.yolov2outputlayer.html)) ([raw](?raw))
(To be removed) Create output layer for YOLO v2 object detection network
yolov2OutputLayer
will be removed in a future release. Create a YOLO v2 object detection network using the yolov2ObjectDetector object instead. For more information, see Version History.
Description
The yolov2OutputLayer
function creates aYOLOv2OutputLayer
object, which represents the output layer for you only look once version 2 (YOLO v2) object detection network. The output layer provides the refined bounding box locations of the target objects.
Creation
Syntax
Description
`layer` = yolov2OutputLayer([anchorBoxes](#mw%5F1d1f4597-678a-4750-8661-30cf5baafaff))
creates a YOLOv2OutputLayer
object, layer
, which represents the output layer for YOLO v2 object detection network. The layer outputs the refined bounding box locations that are predicted using a predefined set of anchor boxes specified at the input.
`layer` = yolov2OutputLayer([anchorBoxes](#mw%5F1d1f4597-678a-4750-8661-30cf5baafaff),`Name,Value`)
sets the additional properties using name-value pairs and the input from the preceding syntax. Enclose each property name in single quotes. For example,yolov2OutputLayer('Name','yolo_Out')
creates an output layer with the name 'yolo_Out'.
Input Arguments
Set of anchor boxes, specified as an _M_-by-2 matrix, where each row is of the form [height width_]. The matrix defines the height and the width of_M number of anchor boxes. This input sets theAnchorBoxes
property of the output layer. You can use the clustering approach for estimating anchor boxes from the training data. For more information, see Estimate Anchor Boxes from Training Data.
Data Types: single
| double
| int8
| int16
| int32
| int64
| uint8
| uint16
| uint32
| uint64
Properties
Data Types: char
| string
This property is read-only.
This property is read-only.
Set of anchor boxes used for training, specified as a _M_-by-2 matrix defining the width and the height of M number of anchor boxes. This property is set by the input anchorBoxes
.
Data Types: single
| double
| int8
| int16
| int32
| int64
| uint8
| uint16
| uint32
| uint64
This property is read-only.
Weights in the loss function, specified as a 1-by-4 vector of form [K1 K2 K3 _K4_]. Weights increase the stability of the network model by penalizing incorrect bounding box predictions and false classifications. For more information about the weights in loss the function, see Loss Function for Bounding Box Refinement.
Data Types: single
| double
| int8
| int16
| int32
| int64
| uint8
| uint16
| uint32
| uint64
Classes of the output layer, specified as a categorical vector, string array, cell array of character vectors, or 'auto'
. Use this name-value pair to specify the names of the object classes in the input training data.
If the value is set to 'auto'
, then the software automatically sets the classes at training time. If you specify the string array or cell array of character vectors str
, then the software sets the classes of the output layer to categorical(str)
. The default value is'auto'
.
Data Types: char
| string
| cell
| categorical
This property is read-only.
Number of inputs to the layer, stored as 1
. This layer accepts a single input only.
Data Types: double
This property is read-only.
Input names, stored as {'in'}
. This layer accepts a single input only.
Data Types: cell
Examples
Create a YOLO v2 output layer with two anchor boxes.
Define the height and the width of the anchor boxes.
anchorBoxes = [16 16;32 32];
Specify the names of the object classes in the training data.
classNames = {'Vehicle','Person'};
Generate a YOLO v2 output layer with the name "yolo_Out"
.
layer = yolov2OutputLayer(anchorBoxes,'Name','yolo_Out','Classes',classNames);
Inspect the properties of the YOLO v2 output layer.
layer = YOLOv2OutputLayer with properties:
Name: 'yolo_Out'
Hyperparameters Classes: [2x1 categorical] LossFunction: 'mean-squared-error' AnchorBoxes: [2x2 double] LossFactors: [5 1 1 1]
You can read the values for Classes
property by using dot notation layer.Classes
. The function stores the class names as a categorical array.
ans = 2x1 categorical Vehicle Person
More About
During training, the output layer of YOLO v2 network predicts refined bounding box locations by optimizing the mean squared error loss between predicted bounding boxes and the ground truth. The loss function is defined as
K1∑i=0S2∑j=0B1ijobj[(xi−x^i)2+(yi−y^i)2] + K1∑i=0S2∑j=0B1ijobj[(wi−w^i)2+(hi−h^i)2] +K2∑i=0S2∑j=0B1ijobj(Ci−C^i)2 +K3∑i=0S2∑j=0B1ijnoobj(Ci−C^i)2 + K4∑i=0S21iobj∑c∈classes(pi(c)−p^i(c))2
where:
- S is the number of grid cells
- B is the number of bounding boxes in each grid cell.
- 1ijobj is 1 if the jth bounding box in grid cell_i_ is responsible for detecting the object. Otherwise it is set to 0. A grid cell i is responsible for detecting the object, if the overlap between the ground truth and a bounding box in that grid cell is greater than or equal to 0.6.
- 1ijnoobj is 1 if the jth bounding box in grid cell_i_ does not contain any object. Otherwise it is set to 0.
- 1iobj is 1 if an object is detected in grid cell i. Otherwise it is set to 0.
- K1,K2,K3, and_K4_ are the weights. To adjust the weights, modify the
LossFactors
property.
The loss function can be split into three parts:
- Localization loss
The first and second terms in the loss function comprise the localization loss. It measures error between the predicted bounding box and the ground truth. The parameters for computing the localization loss include the position, size of the predicted bounding box, and the ground truth. The parameters are defined as follows.- (xi,yi), is the center of the jth bounding box relative to grid cell i.
- (x^i,y^i), is the center of the ground truth relative to grid cell_i_.
- wi and hi is the width and the height of the jth bounding box in grid cell i, respectively. The size of the predicted bounding box is specified relative to the input image size.
- w^i and h^i is the width and the height of the ground truth in grid cell_i_, respectively.
- K1 is the weight for localization loss. Increase this value to increase the weightage for bounding box prediction errors.
- Confidence loss
The third and fourth terms in the loss function comprise the confidence loss. The third term measures the objectness (confidence score) error when an object is detected in the jth bounding box of grid cell_i_. The fourth term measures the objectness error when no object is detected in the jth bounding box of grid cell_i_. The parameters for computing the confidence loss are defined as follows.- Ci is the confidence score of the_jth_ bounding box in grid cell i.
- Ĉi is the confidence score of the ground truth in grid cell i.
- K2 is the weight for objectness error, when an object is detected in the predicted bounding box. You can adjust the value of K2 to weigh confidence scores from grid cells that contain objects.
- K3 is the weight for objectness error, when an object is not detected in the predicted bounding box. You can adjust the value of K3 to weigh confidence scores from grid cells that do not contain objects.
The confidence loss can cause the training to diverge when the number of grid cells that do not contain objects is more than the number of grid cells that contain objects. To remedy this, increase the value for_K2_ and decrease the value for_K3_.
- Classification loss
The fifth term in the loss function comprises the classification loss. For example, suppose that an object is detected in the predicted bounding box contained in grid cell i. Then, the classification loss measures the squared error between the class conditional probabilities for each class in grid cell_i_. The parameters for computing the classification loss are defined as follows.- pi (c) is the estimated conditional class probability for object class c in grid cell i.
- p^i(c) is the actual conditional class probability for object class_c_ in grid cell i.
- K4 is the weight for classification error when an object is detected in the grid cell. Increase this value to increase the weightage for classification loss.
Tips
To improve prediction accuracy, you can:
- Train the network with more number of images. You can expand the training dataset through data augmentation. For information on how to apply data augmentation for training dataset, see Preprocess Images for Deep Learning (Deep Learning Toolbox).
- Perform multiscale training by using the trainYOLOv2ObjectDetector function. To do so, specify the
TrainingImageSize
argument of trainYOLOv2ObjectDetector function for training the network. - Choose anchor boxes appropriate to the dataset for training the network. You can use the estimateAnchorBoxes function to compute anchor boxes directly from the training data.
References
[1] Joseph. R, S. K. Divvala, R. B. Girshick, and F. Ali. "You Only Look Once: Unified, Real-Time Object Detection." In_Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)_, pp. 779–788. Las Vegas, NV: CVPR, 2016.
[2] Joseph. R and F. Ali. "YOLO 9000: Better, Faster, Stronger." In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 6517–6525. Honolulu, HI: CVPR, 2017.
Extended Capabilities
To generate CUDA® or C++ code by using GPU Coder™, you must first construct and train a deep neural network. Once the network is trained and evaluated, you can configure the code generator to generate code and deploy the convolutional neural network on platforms that use NVIDIA® or ARM® GPU processors. For more information, see Deep Learning with GPU Coder (GPU Coder).
For this layer, you can generate code that takes advantage of the NVIDIA CUDA deep neural network library (cuDNN), NVIDIA TensorRT™ high performance inference library, or the ARMCompute Library
for Mali GPU.
Version History
Introduced in R2019a
The yolov2OutputLayer
object will be removed in a future release. When you call the yolov2OutputLayer
object, it issues a warning that it will be removed. Create a YOLO v2 object detection network by using the yolov2ObjectDetector object instead, using these steps:
- Define your network as a dlnetwork (Deep Learning Toolbox) object. You can load a pretrained feature extraction network by using the imagePretrainedNetwork (Deep Learning Toolbox) function. Alternatively, you can use functions such as addLayers (Deep Learning Toolbox) and connectLayers (Deep Learning Toolbox) to build the network. Do not include output layers in the network.
- Create a
yolov2ObjectDetector
object using thedlnetwork
as the custom network. You can specify the anchor boxes, class names, and loss factors using theAnchorBoxes
,ClassNames
, andLossFactors
name-value arguments, respectively.
The yolov2OutputLayer
object will be removed in a future release. Create a YOLO v2 object detection network using the yolov2ObjectDetector object instead, using these steps:
- Define your network as a dlnetwork (Deep Learning Toolbox) object. You can use functions such as addLayers (Deep Learning Toolbox) and connectLayers (Deep Learning Toolbox) to build the network. Do not include output layers in the network.
- Create a
yolov2ObjectDetector
using thedlnetwork
as the custom network. You can specify the anchor boxes, class names, and loss factors using theAnchorBoxes
,ClassNames
, andLossFactors
name-value arguments, respectively.