Get Started with SOLOv2 for Instance Segmentation - MATLAB & Simulink (original) (raw)
Perform instance segmentation using the Computer Vision Toolbox™ Model for SOLOv2 Instance Segmentation support package. To learn more about instance segmentation, see Get Started with Instance Segmentation Using Deep Learning. Use the Computer Vision Toolbox Model for SOLOv2 Instance Segmentation support package for the tasks in these sections.
- To segment object instances in an image using a pretrained SOLOv2 network, or to perform inference on a test image using a trained SOLOv2 network, see the Segment Image with Pretrained SOLOv2 Network section.
- To configure and train a SOLOv2 network to perform transfer learning on your own data, see the Perform Transfer Learning with SOLOv2 section.
The Segmenting Objects by LOcations version 2 (SOLOv2) model for instance segmentation offers the advantage of lightweight, scalable, and memory-efficient architecture [1]. SOLOv2 achieved state-of-the-art performance on the COCO instance segmentation benchmark, outperforming previous models. The model can process inputs of various resolutions due to its multiscale feature pyramid network (FPN), enabling it to capture object details across an extensive range of object sizes. SOLOv2 does not require external region proposal networks, and directly estimates the object centers and associated masks through anchor point localization and mask segmentation modeling.
Install Support Package
You can install the Computer Vision Toolbox Model for SOLOv2 Instance Segmentation from Add-On Explorer. For more information about installing add-ons, see Get and Manage Add-Ons. The support package also requires Deep Learning Toolbox™ and Computer Vision Toolbox. Processing image data on a GPU requires a supported GPU device and Parallel Computing Toolbox™.
Segment Image with Pretrained SOLOv2 Network
Use the process in this section to segment a test image using a pretrained SOLOv2 network with default settings, or to perform inference using a trained SOLOv2 network.
At inference, a fully convolutional network (FCN) backbone of the SOLOv2 network extracts a set of feature maps of various spatial resolutions, or levels, from the input image. The network feeds the extracted feature maps into parallel category and mask branches to generate the final predictions: semantic categories (classes) and instance masks. You can overlay the predicted instance segmentation masks on the image to create the visualization of each object instance, and generate corresponding class labels.
You can perform inference on a test image with default network options using a pretrained SOLOv2 network.
- Load an image or image datastore to segment from the workspace. The SOLOv2 model supports RGB or grayscale images.
- Create a solov2 object to configure a pretrained SOLOv2 network with a ResNet-50 or ResNet-18 backbone as the feature extractor. To increase inference speed at the possible cost of detecting less objects, specify the lightweight ResNet-18 backbone with a reduced number of features,
"light-resnet18-coco"
.
model = solov2("light-resnet18-coco"); - Perform instance segmentation by using the segmentObjects object function on the pretrained network, specifying that the function return the object masks, labels, and detection scores.
[masks,labels,scores] = segmentObjects(model,I); - Visualize the results by using the insertObjectMask function.
maskedImage = insertObjectMask(I,masks);
imshow(maskedImage)
Perform Transfer Learning with SOLOv2
To modify a network to detect additional classes, or to adjust other network parameters, you can perform transfer learning. This section shows how to prepare your training data, configure the SOLOv2 model, and train the network to perform transfer learning.
Configure Training Data
To train a SOLOv2 detector, specify your labeled ground truth training data as a datastore using the trainingData
input argument of thetrainSOLOV2 function. You must set up your data so that calling theread and readall functions on the datastore returns a cell array with four columns. This table describes the format of each cell in each column.
Input Data | Description |
---|---|
RGB or grayscale image | RGB or grayscale images that serve as network inputs, specified as _H_-by-_W_-by-3 or _H_-by-W numeric arrays, respectively. For example, load a sample modified RGB image from the CamVid data set [2] that contains objects of interest such as vehicles, traffic lights, and pedestrians. ![]() |
Ground truth bounding boxes | Bounding boxes for objects in the RGB images, specified as an _M_-by-4 matrix, with rows of the form [x y w _h_], where M is the number of object instances in the image.For example, thebboxes variable shows the bounding boxes of nine objects in the sample RGB image.bboxes = 1 178 94 133 178 173 115 126 63 181 54 68 320 169 15 42 383 173 12 39 359 167 14 41 141 131 12 30 55 86 75 117 146 167 14 43 |
Instance labels | Label of each instance, specified as a_NumObjects_-by-1 vector of strings or a_NumObjects_-by-1 cell array of character vectors. NumObjects is the number of labeled objects in the image.For example, thelabels variable shows the label names of the nine labeled objects in the sample RGB image.labels = 9×1 categorical array car car car person person person traffic light bus person |
Instance masks | Masks for instances of objects. Mask data comes in these formats: Binary masks, specified as a logical array of size_H_-by-_W_-by-NumObjects. Each mask is the segmentation of one instance in the image.Polygon coordinates, specified as a_NumObjects_-by-2 cell array. Each row of the array contains the (x, y) coordinates of a polygon along the boundary of one instance in the image.The SOLOv2 network requires binary masks, not polygon coordinates. If your mask data is in polygon coordinates, use the poly2mask function to convert the polygon coordinates to binary masks of size_h_-by-_w_-by-numObjects. For example, if the variablemasks_polygon contains polygon coordinates, you can use this code to convert them to binary masks.denseMasks = false([h w numObjects]); for i = 1:numObjects denseMasks(:,:,i) = poly2mask(masks_polygon{i}(:,1),masks_polygon{i}(:,2),h,w); endTo display the instance mask data over a sample training image I, use the insertObjectMask function. You can specify a colormap so that each object instance appears in a different color. For example, if the variablemasks contains the corresponding instance masks, overlay the masks over the image using the lines colormap function. imOverlay = insertObjectMask(im,masks,Color=lines(numObjects)); imshow(imOverlay)![]() |
The datastore must return your data as a 1-by-4 cell array with four columns of the form {RGB images Bounding boxes Labels Masks}
. You can create a datastore in the required format using these steps:
- Create an ImageDatastore that returns RGB or grayscale image data.
- Create a boxLabelDatastore that returns bounding box data and instance labels as a two-element cell array.
- Create an ImageDatastore and specify a custom read function that returns mask data as a binary matrix.
- Combine the three datastores using the combine function.
For more information, see Datastores for Deep Learning (Deep Learning Toolbox).
Train the SOLOv2 Network
To configure a SOLOv2 network for training, specify the class names when you create a solov2 object. You can optionally specify additional network properties, such as the network input size to use during training and inference. For example, specify a SOLOv2 network that uses ResNet-50 as the base network to detect the classes inClassNames
during training.
ClassNames = ["person","traffic light","car","bus"]; Network = solov2("resnet50-coco",ClassNames);
Specify the network training options using the trainingOptions (Deep Learning Toolbox) function. To learn more about usingtrainingOptions
to fine-tune network parameters for training, see Set Up Parameters and Train Convolutional Neural Network (Deep Learning Toolbox).
To train the network, pass your training data, the configured solov2 object, and the trainingOptions
function output to thetrainSOLOV2
function. The function returns a trained SOLOv2 network.
trainedNetwork = trainSOLOV2(trainingData,Network,options);
To perform inference on a test image I
using the trained network, pass the trained network as input to the segmentObjects object function. For more details, see the Segment Image with Pretrained SOLOv2 Network section.
For a detailed example of a custom training workflow, see the Perform Instance Segmentation Using SOLOv2 example.
Evaluate Instance Segmentation Results
Evaluate the quality of the instance segmentation results using the evaluateInstanceSegmentation function. Ensure that your ground truth datastore is set up so that calling the datastore with the read function returns a cell array with at least two elements in the format {masks labels}
.
To calculate the prediction metrics, specify the output of the segmentObjects function and your ground truth data as input toevaluateInstanceSegmentation function. The function calculates metrics such as the confusion matrix and average precision. The instanceSegmentationMetrics object stores the metrics.
References
[1] Wang, Xinlong, Rufeng Zhang, Tao Kong, Lei Li, and Chunhua Shen. “SOLOv2: Dynamic and Fast Instance Segmentation.”ArXiv, October 23, 2020. https://doi.org/10.48550/arXiv.2003.10152.
[2] Brostow, Gabriel J., Julien Fauqueur, and Roberto Cipolla. "Semantic Object Classes in Video: A High-Definition Ground Truth Database." Pattern Recognition Letters 30, no. 2 (January 2009): 88–97. https://doi.org/10.1016/j.patrec.2008.04.005.
See Also
Apps
Functions
Topics
- Perform Instance Segmentation Using SOLOv2
- Get Started with Instance Segmentation Using Deep Learning
- Get Started with Image Preprocessing and Augmentation for Deep Learning
- Deep Learning in MATLAB (Deep Learning Toolbox)
- Datastores for Deep Learning (Deep Learning Toolbox)
- Data Sets for Deep Learning (Deep Learning Toolbox)