onehotencode - Encode data labels into one-hot vectors - MATLAB (original) (raw)
Encode data labels into one-hot vectors
Since R2020b
Syntax
Description
[B](#mw%5F233feb52-cc9d-4b52-a6bb-7087199efead) = onehotencode([A](#mw%5F0a86c1cd-08db-40f9-81f0-500b270f85ae),[featureDim](#mw%5F0443d931-b9de-47a1-b266-d88eb94e1638))
encodes data labels in categorical array A
into a one-hot encoded arrayB
. The function replaces each element of A
with a numeric vector of length equal to the number of unique classes in A
along the dimension specified by featureDim
. The vector contains a1
in the position corresponding to the class of the label inA
, and a 0
in every other position. Any<undefined>
values are encoded to NaN
values.
[tblB](#mw%5F93300fdd-e869-4f23-bdc7-ee114f168fe9) = onehotencode([tblA](#mw%5F1f9b6836-0899-493e-82f8-476ba325a92e))
encodes categorical data labels in table tblA
into a table of one-hot encoded numeric values. The function replaces the single variable of tblA
with as many variables as the number of unique classes in tblA
. Each row intblB
contains a 1
in the variable corresponding to the class of the label in tlbA
, and a 0
in all other variables.
___ = onehotencode(___,[typename](#mw%5Fa43ea9b7-9e25-466a-9713-ce7fdcdf49e7))
encodes the labels into numeric values of data type typename
. Use this syntax with any of the input and output arguments in previous syntaxes.
___ = onehotencode(___,'ClassNames',[classes](#mw%5Fb4124f89-7ceb-4536-938d-da979051b1ae))
also specifies the names of the classes to use for encoding. Use this syntax whenA or tblA does not contain categorical values, when you want to exclude any class labels from being encoded, or when you want to encode the vector elements in a specific order. Any label in A
ortblA
of a class that does not exist in classes
is encoded to a vector of NaN
values.
Examples
One-Hot Encode a Vector of Labels
Encode a categorical vector of class labels into one-hot vectors representing the labels.
Create a column vector of labels, where each row of the vector represents a single observation. Convert the labels to a categorical array.
labels = ["red"; "blue"; "red"; "green"; "yellow"; "blue"]; labels = categorical(labels);
View the order of the categories.
ans = 4x1 cell {'blue' } {'green' } {'red' } {'yellow'}
Encode the labels into one-hot vectors. Expand the labels into vectors in the second dimension to encode the classes.
labels = onehotencode(labels,2)
labels = 6×4
0 0 1 0
1 0 0 0
0 0 1 0
0 1 0 0
0 0 0 1
1 0 0 0
Each observation in labels
is now a row vector with a 1
in the position corresponding to the category of the class label and 0
in all other positions. The function encodes the labels in the same order as the categories, such that a 1
in position 1
represents the first category in the list, in this case, 'blue
'.
One-Hot Encode Table
One-hot encode a table of categorical values.
Create a table of categorical data labels. Each row in the table holds a single observation.
color = ["blue"; "red"; "blue"; "green"; "yellow"; "red"]; color = categorical(color); color = table (color);
One-hot encode the table of class labels.
color = onehotencode(color)
color=6×4 table blue green red yellow ____ _____ ___ ______
1 0 0 0
0 0 1 0
1 0 0 0
0 1 0 0
0 0 0 1
0 0 1 0
Each column of the table represents a class. The function encodes the data labels with a 1
in the column of the corresponding class, and 0
everywhere else.
One-Hot Encode Subset of Classes
If not all classes in the data are relevant, encode the data labels using only a subset of the classes.
Create a row vector of data labels, where each column of the vector represents a single observation
pets = ["dog" "fish" "cat" "dog" "cat" "bird"];
Define the list of classes to encode. These classes are a subset of those present in the observations.
animalClasses = ["bird"; "cat"; "dog"];
One-hot encode the observations into the first dimension. Specify the classes to encode.
encPets = onehotencode(pets,1,"ClassNames",animalClasses)
encPets = 3×6
0 NaN 0 0 0 1
0 NaN 1 0 1 0
1 NaN 0 1 0 0
Observations of a class not present in the list of classes to encode are encoded to a vector of NaN
values.
One-Hot Encode Image for Semantic Segmentation
Use onehotencode
to encode a matrix of class labels, such as a semantic segmentation of an image.
Define a simple 15-by-15 pixel segmentation matrix of class labels.
A = "blue"; B = "green"; C = "black";
A = repmat(A,8,15); B = repmat(B,7,5); C = repmat(C,7,5);
seg = [A;B C B];
Convert the segmentation matrix into a categorical array.
One-hot encode the segmentation matrix into an array of type single
. Expand the encoded labels into the third dimension.
encSeg = onehotencode(seg,3,"single");
Check the size of the encoded segmentation.
The three possible classes of the pixels in the segmentation matrix are encoded as vectors in the third dimension.
One-Hot Encode Table with Several Variables
If your data is a table that contains several types of class variables, you can encode each variable separately.
Create a table of observations of several types of categorical data.
color = ["blue"; "red"; "blue"; "green"; "yellow"; "red"]; color = categorical(color);
pets = ["dog"; "fish"; "cat"; "dog"; "cat"; "bird"]; pets = categorical(pets);
location = ["USA"; "CAN"; "CAN"; "USA"; "AUS"; "USA"]; location = categorical(location);
data = table(color,pets,location)
data=6×3 table color pets location ______ ____ ________
blue dog USA
red fish CAN
blue cat CAN
green dog USA
yellow cat AUS
red bird USA
Use a for
-loop to one-hot encode each table variable and append it to a new table containing the encoded data.
encData = table();
for i=1:width(data) encData = [encData onehotencode(data(:,i))]; end
encData
encData=6×11 table blue green red yellow bird cat dog fish AUS CAN USA ____ _____ ___ ______ ____ ___ ___ ____ ___ ___ ___
1 0 0 0 0 0 1 0 0 0 1
0 0 1 0 0 0 0 1 0 1 0
1 0 0 0 0 1 0 0 0 1 0
0 1 0 0 0 0 1 0 0 0 1
0 0 0 1 0 1 0 0 1 0 0
0 0 1 0 1 0 0 0 0 0 1
Each row of encData
encodes the three different categorical classes for each observation.
Input Arguments
A
— Array of data labels
categorical array | numeric array | string array
Array of data labels to encode, specified as a categorical array, a numeric array, or a string array.
- If
A
is a categorical array, the elements of the one-hot encoded vectors match the same order incategories(A)
. - If A is not a categorical array, you must specify the classes to encode using the
'ClassNames'
name-value argument. The function encodes the vectors in the order that the classes appear in classes. - If
A
contains undefined values or values not present inclasses
, the function encodes those values as a vector ofNaN
values.typename
must be'double'
or'single'
.
Data Types: categorical
| numeric
| string
tblA
— Table of data labels
table
Table of data labels to encode, specified as a table. The table must contain a single variable and one row for each observation. Each entry must contain a categorical scalar, a numeric scalar, or a string scalar.
- If
tblA
contains categorical values, the elements of the one-hot encoded vectors match the order of the categories; for example, the same order ascategories(tbl(1,n))
. - If
tblA
does not contain categorical values, you must specify the classes to encode using the'ClassNames'
name-value argument. The function encodes the vectors in the order that the classes appear in classes. - If
tblA
contains undefined values or values not present inclasses
, the function encodes those values asNaN
values.typename
must be'double'
or'single'
.
Data Types: table
featureDim
— Dimension to expand
positive integer
Dimension to expand to encode the labels, specified as a positive integer.
featureDim
must specify a singleton dimension of A, or be larger than n
where n
is the number of dimensions of A
.
typename
— Data type of encoded labels
'double'
(default) | character vector | string scalar
Data type of the encoded labels, specified as a character vector or a string scalar.
- If the classification label input is a categorical array, a numeric array, or a string array, then the encoded labels are returned as an array of data type
typename
. - If the classification label input is a table, then the encoded labels are returned as a table where each entry has data type
typename
.
Valid values of typename
are floating point, signed and unsigned integer, and logical types.
Example: 'int64'
Data Types: char
| string
classes
— Classes to encode
cell array | string vector | numeric vector | character array
Classes to encode, specified as a cell array of character vectors, a string vector, a numeric vector, or a two-dimensional character array.
- If the input A or tblA does not contain categorical values, then you must specify
classes
. You can also use theclasses
argument to exclude any class labels from being encoded, or to encode the vector elements in a specific order. - If
A
ortblA
contains undefined values or values not present inclasses
, the function encodes those values to a vector ofNaN
values.typename
must be'double'
or'single'
.
Data Types: single
| double
| int8
| int16
| int32
| int64
| uint8
| uint16
| uint32
| uint64
| string
| cell
Output Arguments
B
— Encoded labels
numeric array
Encoded labels, returned as a numeric array.
tblB
— Encoded labels
table
Encoded labels, returned as a table.
Each row of tblB
contains the one-hot encoded label for a single observation, in the same order as in tblA. Each row contains a1
in the variable corresponding to the class of the label intlbA
, and a 0
in all other variables.
Version History
Introduced in R2020b