onehotencode - Encode data labels into one-hot vectors - MATLAB (original) (raw)

Encode data labels into one-hot vectors

Since R2020b

Syntax

Description

[B](#mw%5F233feb52-cc9d-4b52-a6bb-7087199efead) = onehotencode([A](#mw%5F0a86c1cd-08db-40f9-81f0-500b270f85ae),[featureDim](#mw%5F0443d931-b9de-47a1-b266-d88eb94e1638)) encodes data labels in categorical array A into a one-hot encoded arrayB. The function replaces each element of A with a numeric vector of length equal to the number of unique classes in A along the dimension specified by featureDim. The vector contains a1 in the position corresponding to the class of the label inA, and a 0 in every other position. Any<undefined> values are encoded to NaN values.

example

[tblB](#mw%5F93300fdd-e869-4f23-bdc7-ee114f168fe9) = onehotencode([tblA](#mw%5F1f9b6836-0899-493e-82f8-476ba325a92e)) encodes categorical data labels in table tblA into a table of one-hot encoded numeric values. The function replaces the single variable of tblA with as many variables as the number of unique classes in tblA. Each row intblB contains a 1 in the variable corresponding to the class of the label in tlbA, and a 0 in all other variables.

example

___ = onehotencode(___,[typename](#mw%5Fa43ea9b7-9e25-466a-9713-ce7fdcdf49e7)) encodes the labels into numeric values of data type typename. Use this syntax with any of the input and output arguments in previous syntaxes.

example

___ = onehotencode(___,'ClassNames',[classes](#mw%5Fb4124f89-7ceb-4536-938d-da979051b1ae)) also specifies the names of the classes to use for encoding. Use this syntax whenA or tblA does not contain categorical values, when you want to exclude any class labels from being encoded, or when you want to encode the vector elements in a specific order. Any label in A ortblA of a class that does not exist in classes is encoded to a vector of NaN values.

example

Examples

collapse all

One-Hot Encode a Vector of Labels

Encode a categorical vector of class labels into one-hot vectors representing the labels.

Create a column vector of labels, where each row of the vector represents a single observation. Convert the labels to a categorical array.

labels = ["red"; "blue"; "red"; "green"; "yellow"; "blue"]; labels = categorical(labels);

View the order of the categories.

ans = 4x1 cell {'blue' } {'green' } {'red' } {'yellow'}

Encode the labels into one-hot vectors. Expand the labels into vectors in the second dimension to encode the classes.

labels = onehotencode(labels,2)

labels = 6×4

 0     0     1     0
 1     0     0     0
 0     0     1     0
 0     1     0     0
 0     0     0     1
 1     0     0     0

Each observation in labels is now a row vector with a 1 in the position corresponding to the category of the class label and 0 in all other positions. The function encodes the labels in the same order as the categories, such that a 1 in position 1 represents the first category in the list, in this case, 'blue'.

One-Hot Encode Table

One-hot encode a table of categorical values.

Create a table of categorical data labels. Each row in the table holds a single observation.

color = ["blue"; "red"; "blue"; "green"; "yellow"; "red"]; color = categorical(color); color = table (color);

One-hot encode the table of class labels.

color = onehotencode(color)

color=6×4 table blue green red yellow ____ _____ ___ ______

 1        0       0       0   
 0        0       1       0   
 1        0       0       0   
 0        1       0       0   
 0        0       0       1   
 0        0       1       0   

Each column of the table represents a class. The function encodes the data labels with a 1 in the column of the corresponding class, and 0 everywhere else.

One-Hot Encode Subset of Classes

If not all classes in the data are relevant, encode the data labels using only a subset of the classes.

Create a row vector of data labels, where each column of the vector represents a single observation

pets = ["dog" "fish" "cat" "dog" "cat" "bird"];

Define the list of classes to encode. These classes are a subset of those present in the observations.

animalClasses = ["bird"; "cat"; "dog"];

One-hot encode the observations into the first dimension. Specify the classes to encode.

encPets = onehotencode(pets,1,"ClassNames",animalClasses)

encPets = 3×6

 0   NaN     0     0     0     1
 0   NaN     1     0     1     0
 1   NaN     0     1     0     0

Observations of a class not present in the list of classes to encode are encoded to a vector of NaN values.

One-Hot Encode Image for Semantic Segmentation

Use onehotencode to encode a matrix of class labels, such as a semantic segmentation of an image.

Define a simple 15-by-15 pixel segmentation matrix of class labels.

A = "blue"; B = "green"; C = "black";

A = repmat(A,8,15); B = repmat(B,7,5); C = repmat(C,7,5);

seg = [A;B C B];

Convert the segmentation matrix into a categorical array.

One-hot encode the segmentation matrix into an array of type single. Expand the encoded labels into the third dimension.

encSeg = onehotencode(seg,3,"single");

Check the size of the encoded segmentation.

The three possible classes of the pixels in the segmentation matrix are encoded as vectors in the third dimension.

One-Hot Encode Table with Several Variables

If your data is a table that contains several types of class variables, you can encode each variable separately.

Create a table of observations of several types of categorical data.

color = ["blue"; "red"; "blue"; "green"; "yellow"; "red"]; color = categorical(color);

pets = ["dog"; "fish"; "cat"; "dog"; "cat"; "bird"]; pets = categorical(pets);

location = ["USA"; "CAN"; "CAN"; "USA"; "AUS"; "USA"]; location = categorical(location);

data = table(color,pets,location)

data=6×3 table color pets location ______ ____ ________

blue      dog       USA   
red       fish      CAN   
blue      cat       CAN   
green     dog       USA   
yellow    cat       AUS   
red       bird      USA   

Use a for-loop to one-hot encode each table variable and append it to a new table containing the encoded data.

encData = table();

for i=1:width(data) encData = [encData onehotencode(data(:,i))]; end

encData

encData=6×11 table blue green red yellow bird cat dog fish AUS CAN USA ____ _____ ___ ______ ____ ___ ___ ____ ___ ___ ___

 1        0       0       0        0       0      1      0       0      0      1 
 0        0       1       0        0       0      0      1       0      1      0 
 1        0       0       0        0       1      0      0       0      1      0 
 0        1       0       0        0       0      1      0       0      0      1 
 0        0       0       1        0       1      0      0       1      0      0 
 0        0       1       0        1       0      0      0       0      0      1 

Each row of encData encodes the three different categorical classes for each observation.

Input Arguments

collapse all

A — Array of data labels

categorical array | numeric array | string array

Array of data labels to encode, specified as a categorical array, a numeric array, or a string array.

Data Types: categorical | numeric | string

tblA — Table of data labels

table

Table of data labels to encode, specified as a table. The table must contain a single variable and one row for each observation. Each entry must contain a categorical scalar, a numeric scalar, or a string scalar.

Data Types: table

featureDim — Dimension to expand

positive integer

Dimension to expand to encode the labels, specified as a positive integer.

featureDim must specify a singleton dimension of A, or be larger than n where n is the number of dimensions of A.

typename — Data type of encoded labels

'double' (default) | character vector | string scalar

Data type of the encoded labels, specified as a character vector or a string scalar.

Valid values of typename are floating point, signed and unsigned integer, and logical types.

Example: 'int64'

Data Types: char | string

classes — Classes to encode

cell array | string vector | numeric vector | character array

Classes to encode, specified as a cell array of character vectors, a string vector, a numeric vector, or a two-dimensional character array.

Data Types: single | double | int8 | int16 | int32 | int64 | uint8 | uint16 | uint32 | uint64 | string | cell

Output Arguments

collapse all

B — Encoded labels

numeric array

Encoded labels, returned as a numeric array.

tblB — Encoded labels

table

Encoded labels, returned as a table.

Each row of tblB contains the one-hot encoded label for a single observation, in the same order as in tblA. Each row contains a1 in the variable corresponding to the class of the label intlbA, and a 0 in all other variables.

Version History

Introduced in R2020b