categorical - Array that contains values assigned to categories - MATLAB (original) (raw)
Array that contains values assigned to categories
Description
categorical
is a data type that assigns values to a finite set of discrete categories, such as High
, Med
, andLow
. These categories can have a mathematical ordering that you specify, such as High > Med > Low
, but it is not required. A categorical array provides efficient storage and convenient manipulation of nonnumeric data, while also maintaining meaningful names for the values. A common use of categorical arrays is to define groups of rows in a table.
Creation
To create a categorical
array:
- Use the
categorical
function as described below. - Bin continuous data using the discretize function. Return the bins as a categorical array.
- Multiply two categorical arrays. The product is a categorical array whose categories are all possible combinations of the categories of the two operands.
Syntax
Description
B = categorical([A](#d126e178698))
creates a categorical array from the input array. The categories of the output array are the sorted unique values from the input array.
B = categorical([A](#d126e178698),[valueset](#d126e178790))
creates one category for each value in valueset
. The categories of B
are in the same order as the values ofvalueset
.
You can use valueset
to include categories for values not present in A
. Conversely, if A
contains any values not present in valueset
, then the corresponding elements of B
are undefined.
B = categorical([A](#d126e178698),[valueset](#d126e178790),[catnames](#d126e178822))
names categories by matching the category values invalueset
to the corresponding names incatnames
.
B = categorical([A](#d126e178698),___,[Name=Value](#namevaluepairarguments))
specifies options using one or more name-value arguments in addition to the input arguments in previous syntaxes. For example, to indicate that the categories have a mathematical ordering, set Ordinal totrue
.
Input Arguments
Input array, specified as a numeric array, logical array, categorical array, datetime array, duration array, string array, or cell array of character vectors.
The categorical
function removes leading and trailing spaces from input values that are strings or character vectors.
If the input A
contains missing values, then the corresponding element of the output array is undefined and displays as<undefined>
. Thecategorical
function converts the following values to undefined categorical values:
NaN
in numeric and duration arrays- The missing string (
<missing>
) or the empty string (""
) in string arrays - The empty character vector (
''
) in cell arrays of character vectors NaT
in datetime arrays- Undefined values (
<undefined>
) in categorical arrays
The output array does not have a category for undefined values. To create an explicit category for missing or undefined values, you must include the desired category name in catnames
, and a missing value as the corresponding value invalueset
.
The input A
also can be an array of objects with the following class methods:
unique
eq
Categories, specified as a vector of unique values. The data type ofvalueset
and the data type of the input array must be the same, except when the input is a string array. In that case,valueset
can be either a string array or a cell array of character vectors.
The categorical
function removes leading and trailing spaces from elements of valueset
that are strings or character vectors.
Category names, specified as a string array or a cell array of character vectors. If you do not specify the catnames
input argument, then categorical
uses the values invalueset
as category names.
The category names cannot include a missing string (<missing>
), an empty string (""
), or an empty character vector (''
).
To merge multiple distinct values from the input array into a single category in the output array, include duplicate names corresponding to those values.
Name-Value Arguments
Specify optional pairs of arguments asName1=Value1,...,NameN=ValueN
, where Name
is the argument name and Value
is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter.
Before R2021a, use commas to separate each name and value, and enclose Name
in quotes.
Example: categorical(A,Ordinal=true)
specifies that the categories have a mathematical ordering.
Ordinal variable flag, specified as a numeric or logical0
(false
) or1
(true
).
0 (false) | categorical creates a categorical array that is not ordinal, which is the default behavior.The categories of the output array have no mathematical ordering. Therefore, you can compare the values in the output for equality only. You cannot compare the values using any other relational operator. |
---|---|
1 (true) | categorical creates an ordinal categorical array.The categories of the output array have a mathematical ordering, such that the first category specified is the smallest and the last category is the largest. You can compare the values in the output using relational operators, such as less than and greater than, in addition to comparing the values for equality. You also can use the min andmax functions on an ordinal categorical array. |
For more information, see Ordinal Categorical Arrays.
Protected categories flag, specified as a numeric or logical0
(false
) or1
(true
).
The categories of ordinal categorical arrays are always protected. If you set Ordinal
to true
, then the default value of Protected
is alsotrue
. Otherwise, the default value ofProtected
isfalse
.
0 (false) | When you assign new values to the output array, the categories update automatically. Therefore, you can combine (nonordinal) categorical arrays that have different categories. The categories can update accordingly to include the categories from both arrays. |
---|---|
1 (true) | When you assign new values to the output array, the values must belong to one of the existing categories. Therefore, you can only combine arrays that have the same categories. To add new categories to the output, you must use the functionaddcats. |
Examples
Create a categorical array from a list of weather station codes. Then add it to a table of temperature readings. Use the categorical array to help you analyze the data in the table by category.
First, create an array of weather station codes.
Stations = ["S1" "S2" "S1" "S3" "S2"]
Stations = 1×5 string "S1" "S2" "S1" "S3" "S2"
To create a categorical array from the weather station codes, use the categorical
function.
Stations = categorical(Stations)
Stations = 1×5 categorical S1 S2 S1 S3 S2
Display the categories. The three station codes are the categories.
ans = 3×1 cell {'S1'} {'S2'} {'S3'}
Now create a table that contains weather data. The table has temperatures, dates, and station codes.
Temperatures = [58;72;56;90;76]; Dates = datetime(["2017-04-17";"2017-04-18";"2017-04-30";"2017-05-01";"2017-04-27"]); Stations = Stations'; tempReadings = table(Temperatures,Dates,Stations)
tempReadings=5×3 table Temperatures Dates Stations ____________ ___________ ________
58 17-Apr-2017 S1
72 18-Apr-2017 S2
56 30-Apr-2017 S1
90 01-May-2017 S3
76 27-Apr-2017 S2
Categorize the data in the table by weather station. For example, return table rows that have data for station S2
. Index into the table using an array of logical indices where Stations
equals S2
.
TF = (tempReadings.Stations == "S2")
TF = 5×1 logical array
0 1 0 0 1
ans=2×3 table Temperatures Dates Stations ____________ ___________ ________
72 18-Apr-2017 S2
76 27-Apr-2017 S2
To find patterns in the data associated with weather stations, make a scatter plot of temperature readings by station.
scatter(tempReadings,"Stations","Temperatures","filled")
Convert a string array to a categorical array. Specify that the categorical array has a set of categories that includes a value that is not present in the original array.
First, create a string array that has a set of repeated values.
A = ["red" "blue" "blue" "blue" "blue" "red"]
A = 1×6 string "red" "blue" "blue" "blue" "blue" "red"
Convert the string array to a categorical array. Specify its categories. Include green
as a category.
valueset = ["blue" "red" "green"]; B = categorical(A,valueset)
B = 1×6 categorical red blue blue blue blue red
Display the categories of the categorical array. It has a category that did not come from the input string array.
ans = 3×1 cell {'blue' } {'red' } {'green'}
Create a numeric array.
A = [1 3 2; 2 1 3; 3 1 2]
A = 3×3
1 3 2
2 1 3
3 1 2
Convert the numeric array to a categorical array. Specify the values and the names for the categories.
B = categorical(A,[1 2 3],["red" "green" "blue"])
B = 3×3 categorical
red blue green
green red blue
blue red green
Display the categories.
ans = 3×1 cell {'red' } {'green'} {'blue' }
B
is not an ordinal categorical array. Therefore, you can compare the values in B
only using the equality operators, ==
and ~=
.
Find the elements that belong to the category red
. Access those elements using logical indexing.
TF = 3×3 logical array
1 0 0 0 1 0 0 1 0
ans = 3×1 categorical red red red
By default, the categorical
function converts missing values (such as NaN
s, NaT
s, empty strings, and missing strings) into undefined categorical values. However, when you call categorical
you can specify a category for missing values to belong to.
For example, create a string array that includes an empty string and a missing string.
A = ["hi" "lo" missing "" "lo" "lo" "hi"]
A = 1×7 string "hi" "lo" "" "lo" "lo" "hi"
First, convert the string array to a categorical array with undefined elements.
C = 1×7 categorical hi lo lo lo hi
ans = 2×1 cell {'hi'} {'lo'}
Then, convert it again. But this time specify INDEF
as the category for missing strings.
C = categorical(A,["lo" "hi" missing],["lo" "hi" "INDEF"])
C = 1×7 categorical hi lo INDEF lo lo hi
ans = 3×1 cell {'lo' } {'hi' } {'INDEF'}
Specify INDEF
as the category for both missing and empty strings.
C = categorical(A,["lo" "hi" missing ""],["lo" "hi" "INDEF" "INDEF"])
C = 1×7 categorical hi lo INDEF INDEF lo lo hi
ans = 3×1 cell {'lo' } {'hi' } {'INDEF'}
Create a 5-by-2 numeric array.
A = [3 2;3 3;3 2;2 1;3 2]
A = 5×2
3 2
3 3
3 2
2 1
3 2
Convert A
to an ordinal categorical array where 1
, 2
, and 3
represent the categories child
, adult
, and senior
respectively.
valueset = [1 2 3]; catnames = ["child" "adult" "senior"]; B = categorical(A,valueset,catnames,Ordinal=true)
B = 5×2 categorical
senior adult
senior senior
senior adult
adult child
senior adult
Because B
is ordinal, the categories of B
have a mathematical ordering, child < adult < senior
. You can use all relational operators with ordinal categorical values. For example, return the elements that have a value greater than adult
.
TF = 5×2 logical array
1 0 1 1 1 0 0 0 1 0
ans = 5×1 categorical senior senior senior senior senior
You can preallocate a categorical array of any size by creating an array of NaN
s and converting it to a categorical array. After you preallocate the array, you can initialize its categories by specifying category names and adding the categories to the array.
First create an array of NaN
s. You can create an array having any size. For example, create a 2-by-4 array of NaN
s.
A = 2×4
NaN NaN NaN NaN NaN NaN NaN NaN
Then preallocate a categorical array by converting the array of NaN
s. The categorical
function converts NaN
s to undefined categorical values. Just as a NaN
represents "not a number", <undefined>
represents a categorical value that does not belong to a category.
A = 2×4 categorical
In fact, at this point A
has no categories.
ans =
0×0 empty cell array
To initialize the categories of A
, specify category names and add them to A
by using the addcats
function. For example, add small
, medium
, and large
as three categories of A
.
A = addcats(A,["small" "medium" "large"])
A = 2×4 categorical
While the elements of A
are undefined values, the categories have been initialized by addcats
.
ans = 3×1 cell {'small' } {'medium'} {'large' }
Now that A
has categories, you can assign defined categorical values as elements of A
.
A(1) = "medium"; A(8) = "small"; A(3:5) = "large"
A = 2×4 categorical medium large large large small
The discretize
function is recommended for creating categories out of continuous data, particularly when there are input values that are closely spaced. Two values are closely spaced when the difference between them is less than about 5e-5
. When values are closely spaced, the categorical
function cannot create unique category names from the values.
Create a numeric array with 100 random numbers.
X = 100×1
0.8147
0.9058
0.1270
0.9134
0.6324
0.0975
0.2785
0.5469
0.9575
0.9649
0.1576
0.9706
0.9572
0.4854
0.8003
⋮
To bin the numbers into three categories, use discretize
. Specify bin boundaries and category names for the bins.
C = discretize(X,[0 .25 .75 1],"categorical",["small" "medium" "large"])
C = 100×1 categorical large large small large medium small medium medium large large small large large medium large small medium large large large medium small large large medium large medium medium medium small ⋮
Plot a histogram of the three categories of data.
When you multiply two categorical arrays, the result is a categorical array with a set of new categories. The new categories are all the ordered pairs created from the categories of the two original categorical arrays. This set of all possible combinations of categories is also known as the Cartesian product of the two original sets of categories.
For example, create two categorical arrays. These arrays list blood groups and Rh factors for six patients.
bloodGroups = categorical(["A" "AB" "O" "O" "A" "A"], ... ["A" "B" "AB" "O"])
bloodGroups = 1×6 categorical A AB O O A A
Rhfactors = categorical(["+" "+" "-" "-" "+" "+"])
Rhfactors = 1×6 categorical + + - - + +
Display the categories of the two arrays. While the two categorical arrays have the same numbers of elements, they can have different numbers of categories.
ans = 4×1 cell {'A' } {'B' } {'AB'} {'O' }
ans = 2×1 cell {'+'} {'-'}
Multiply the two categorical arrays. The elements of the product come from combinations of the corresponding elements from the input arrays.
bloodTypes = bloodGroups .* Rhfactors
bloodTypes = 1×6 categorical A + AB + O - O - A + A +
However, the categories of the product are all the ordered pairs that can be created from the categories of the two arrays. So, it is possible that some categories are not represented by any element of the output array.
ans = 8×1 cell {'A +' } {'A -' } {'B +' } {'B -' } {'AB +'} {'AB -'} {'O +' } {'O -' }
Limitations
- If the input array is a numeric, datetime, or duration array, and you create category names from the values in the input, then
categorical
rounds them off to five significant figures.
For example,categorical([1 1.23456789])
creates category names1
and1.2346
from these two values. To create categories from continuous numeric, duration, or datetime data, use the discretize function. - If the input array has numeric, datetime, or duration values that are too closely spaced, then
categorical
cannot create category names from those values. In general, such values are too closely spaced if the difference between any two values in the input is less than about5e-5
.
For example,categorical([1 1.00001])
cannot create category names from the two numeric values because the difference between them is too small. To create categories from continuous numeric, duration, or datetime data, use thediscretize
function.
Tips
- For a list of functions that accept or return categorical arrays, see Categorical Arrays.
Extended Capabilities
Thecategorical
function supports tall arrays with the following usage notes and limitations:
- If the list of categories is known, a best practice is to provide the categories when you create the tall categorical array using
categorical(A,valueset)
. If the categories are not provided, then many calculations require MATLAB® to perform an extra pass through the data to determine the categories.
For more information, see Tall Arrays.
Usage notes and limitations:
- For the one input syntax
B = categorical(A)
, the order of the categories is undefined. To enforce the order, usevalueset
andcatnames
.
For more information, see Run MATLAB Functions with Distributed Arrays (Parallel Computing Toolbox).
Version History
Introduced in R2013b