slowFastVideoClassifier - SlowFast video classifier. Requires Computer Vision Toolbox Model for SlowFast Video Classification - MATLAB (original) (raw)
SlowFast video classifier. Requires Computer Vision Toolbox Model for SlowFast Video Classification
Since R2021b
Description
The slowFastVideoClassifier
object is a SlowFast video classifier pretrained on the Kinetics-400 data set with a ResNet-50 3-D convolutional neural network (CNN). You can use the pretrained video classifier to classify 400 human actions such as running, walking, and shaking hands.
Creation
Syntax
Description
`sf` = slowFastVideoClassifier
returns a SlowFast video classifier pretrained on the Kinetics-400 data set.
`sf` = slowFastVideoClassifier(`"resnet50-3d"`,`classes`)
configures the pretrained SlowFast video classifier for transfer learning on a new set of classes, classes
.
`sf` = slowFastVideoClassifier(___,Name=Value)
sets properties using name-value arguments in addition to the input arguments from the previous syntax. For example, sf = slowFastVideoClassifier("resnet50-3d",classes,InputSize=[256,256,3,32])
sets the input size of the network. You can specify multiple name-value arguments.
Note
This function requires the Computer Vision Toolbox™ Model for SlowFast Video Classification. You can install Computer Vision Toolbox Model for SlowFast Video Classification from Add-On Explorer. For more information about installing add-ons, seeGet and Manage Add-Ons. To use this object, you must have a license for the Deep Learning Toolbox™.
Properties
Configure Classifier Properties
This property is read-only.
Size of the video classifier network, specified as a four-element row vector in the form [H,W,C,T_], where H and W represent the height and width respectively, C represents the number of channels, and_T represents the number of frames for the video subnetwork.
Typical values for the number of frames are 8, 16, 32, or 64. Increase the number of frames to capture the temporal nature of activities when training the classifier.
This property is read-only.
Normalization statistics for the video data, specified as a structure with field names Min
, Max
, Mean
, andStandardDeviation
. The Min
andMax
field values define the minimum and maximum values for rescaling the video data. The Mean
, andStandardDeviation
values define the mean and standard deviation for input normalization. All field values must be specified as a row vector of size equal to the number of channels for the video input data.
The default structure contains the fields, Min
,Max
, Mean
andStandardDeviation
with values [0,0,0]
,[255,255,255]
, [0.45,0.45,0.45]
, and[0.225,0.225,0.225]
, respectively. You must calculate the statistics values from the dataset for which you are training the video classifier. To rescale the data using minimum and maximum values precomputed from your dataset, specify both Min
and Max
. Otherwise, the minimum and maximum values are calculated from each input sequence when using updateSequence or classifyVideoFile.
Note
The object normalizes the data by rescaling it between 0
and 1
, and then the rescaled data is standardized by subtracting the mean and dividing by the standard deviation. The rescaled data is standardized if the Mean
andStandardDeviation
fields are non-empty. The input is automatically normalized when using updateSequence or classifyVideoFile object functions. The data must be manually normalized when using the forward or predict object functions.
Name of the trained video classifier, specified as a string scalar.
This property is read-only.
Classes that the video classifier is configured to train or classify, specified as a vector of strings or a cell array of character vectors. For example:
classes = ['kiss','laugh','pick','pour','pushup'];
Training Properties
Learnable parameters for the SlowFast video classifier, specified as a table with three columns.
Layer
— Layer name, specified as a string scalar.Parameter
— Parameter name, specified as a string scalar.Value
— Parameter value, specified as a dlarray (Deep Learning Toolbox) object.
The network state contains information remembered by the network between iterations. For example, the state of long short term networks (LSTM) and batch normalization layers. During training or inference, you can update the network state using the output of the forward and predict object functions.
State of the nonlearnable parameters of the SlowFast video classifier, specified as a table with three columns.
Layer
— Layer name, specified as a string scalar.Parameter
— Parameter name, specified as a string scalar.Value
— Parameter value, specified as a dlarray (Deep Learning Toolbox) object.
The network learnable parameters contain the features learned by the network. For example, the weights of convolution and fully connected layers.
Streaming Video Classification Properties
This property is read-only.
Video sequence used to update and classify sequences for streaming classification, specified as a 4-D numeric array. Each vector in the array is of the form [H,W,C,T_], where H and W represent the height and width respectively, C represents the number of channels, and_T represents the number of frames, for the video subnetwork. TheupdateSequence
and classifySequence
object functions use the video sequence specified by the VideoSequence
property.
Object Functions
forward | Compute video classifier outputs for training |
---|---|
predict | Compute video classifier predictions |
Examples
This example requires the Computer Vision Toolbox™ Model for SlowFast Video Classification. You can install the Computer Vision Toolbox Model for SlowFast Video Classification from Add-On Explorer. For more information about installing add-ons, see Get and Manage Add-Ons.
Load a slowfast video classifier pretrained on the Kinetics-400 data set.
sf = slowFastVideoClassifier;
Specify the file name of the video to classify.
videoFilename = "washingHands.avi";
For video classification, set the number of randomly selected video sequences to 15.
Classify the video using the classifyVideoFile
function.
[label,score] = classifyVideoFile(sf,videoFilename,NumSequences=numSequences)
label = categorical washing hands
Display the classified label using a vision.VideoPlayer
.
player = vision.VideoPlayer('Name','Washing Hands');
reader = VideoReader(videoFilename);
while hasFrame(reader)
frame = readFrame(reader);
% Resize the frame by 1.5 times for display
frame = imresize(frame,1.5);
frame = insertText(frame,[2,2], string(label),'FontSize',18);
step(player,frame);
end
Version History
Introduced in R2021b
See Also
Apps
Functions
Objects
- dlnetwork (Deep Learning Toolbox) | inflated3dVideoClassifier | r2plus1dVideoClassifier