categorical - Array that contains values assigned to categories - MATLAB (original) (raw)

Array that contains values assigned to categories

Description

categorical is a data type that assigns values to a finite set of discrete categories, such as High, Med, andLow. These categories can have a mathematical ordering that you specify, such as High > Med > Low, but it is not required. A categorical array provides efficient storage and convenient manipulation of nonnumeric data, while also maintaining meaningful names for the values. A common use of categorical arrays is to define groups of rows in a table.

Creation

To create a categorical array:

Syntax

Description

B = categorical([A](#d126e178698)) creates a categorical array from the input array. The categories of the output array are the sorted unique values from the input array.

example

B = categorical([A](#d126e178698),[valueset](#d126e178790)) creates one category for each value in valueset. The categories of B are in the same order as the values ofvalueset.

You can use valueset to include categories for values not present in A. Conversely, if A contains any values not present in valueset, then the corresponding elements of B are undefined.

example

B = categorical([A](#d126e178698),[valueset](#d126e178790),[catnames](#d126e178822)) names categories by matching the category values invalueset to the corresponding names incatnames.

example

B = categorical([A](#d126e178698),___,[Name=Value](#namevaluepairarguments)) specifies options using one or more name-value arguments in addition to the input arguments in previous syntaxes. For example, to indicate that the categories have a mathematical ordering, set Ordinal totrue.

example

Input Arguments

expand all

Input array, specified as a numeric array, logical array, categorical array, datetime array, duration array, string array, or cell array of character vectors.

The categorical function removes leading and trailing spaces from input values that are strings or character vectors.

If the input A contains missing values, then the corresponding element of the output array is undefined and displays as<undefined>. Thecategorical function converts the following values to undefined categorical values:

The output array does not have a category for undefined values. To create an explicit category for missing or undefined values, you must include the desired category name in catnames, and a missing value as the corresponding value invalueset.

The input A also can be an array of objects with the following class methods:

Categories, specified as a vector of unique values. The data type ofvalueset and the data type of the input array must be the same, except when the input is a string array. In that case,valueset can be either a string array or a cell array of character vectors.

The categorical function removes leading and trailing spaces from elements of valueset that are strings or character vectors.

Category names, specified as a string array or a cell array of character vectors. If you do not specify the catnames input argument, then categorical uses the values invalueset as category names.

The category names cannot include a missing string (<missing>), an empty string (""), or an empty character vector ('').

To merge multiple distinct values from the input array into a single category in the output array, include duplicate names corresponding to those values.

Name-Value Arguments

expand all

Specify optional pairs of arguments asName1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter.

Before R2021a, use commas to separate each name and value, and enclose Name in quotes.

Example: categorical(A,Ordinal=true) specifies that the categories have a mathematical ordering.

Ordinal variable flag, specified as a numeric or logical0 (false) or1 (true).

0 (false) categorical creates a categorical array that is not ordinal, which is the default behavior.The categories of the output array have no mathematical ordering. Therefore, you can compare the values in the output for equality only. You cannot compare the values using any other relational operator.
1 (true) categorical creates an ordinal categorical array.The categories of the output array have a mathematical ordering, such that the first category specified is the smallest and the last category is the largest. You can compare the values in the output using relational operators, such as less than and greater than, in addition to comparing the values for equality. You also can use the min andmax functions on an ordinal categorical array.

For more information, see Ordinal Categorical Arrays.

Protected categories flag, specified as a numeric or logical0 (false) or1 (true).

The categories of ordinal categorical arrays are always protected. If you set Ordinal to true, then the default value of Protected is alsotrue. Otherwise, the default value ofProtected isfalse.

0 (false) When you assign new values to the output array, the categories update automatically. Therefore, you can combine (nonordinal) categorical arrays that have different categories. The categories can update accordingly to include the categories from both arrays.
1 (true) When you assign new values to the output array, the values must belong to one of the existing categories. Therefore, you can only combine arrays that have the same categories. To add new categories to the output, you must use the functionaddcats.

Examples

collapse all

Create a categorical array from a list of weather station codes. Then add it to a table of temperature readings. Use the categorical array to help you analyze the data in the table by category.

First, create an array of weather station codes.

Stations = ["S1" "S2" "S1" "S3" "S2"]

Stations = 1×5 string "S1" "S2" "S1" "S3" "S2"

To create a categorical array from the weather station codes, use the categorical function.

Stations = categorical(Stations)

Stations = 1×5 categorical S1 S2 S1 S3 S2

Display the categories. The three station codes are the categories.

ans = 3×1 cell {'S1'} {'S2'} {'S3'}

Now create a table that contains weather data. The table has temperatures, dates, and station codes.

Temperatures = [58;72;56;90;76]; Dates = datetime(["2017-04-17";"2017-04-18";"2017-04-30";"2017-05-01";"2017-04-27"]); Stations = Stations'; tempReadings = table(Temperatures,Dates,Stations)

tempReadings=5×3 table Temperatures Dates Stations ____________ ___________ ________

     58         17-Apr-2017       S1   
     72         18-Apr-2017       S2   
     56         30-Apr-2017       S1   
     90         01-May-2017       S3   
     76         27-Apr-2017       S2   

Categorize the data in the table by weather station. For example, return table rows that have data for station S2. Index into the table using an array of logical indices where Stations equals S2.

TF = (tempReadings.Stations == "S2")

TF = 5×1 logical array

0 1 0 0 1

ans=2×3 table Temperatures Dates Stations ____________ ___________ ________

     72         18-Apr-2017       S2   
     76         27-Apr-2017       S2   

To find patterns in the data associated with weather stations, make a scatter plot of temperature readings by station.

scatter(tempReadings,"Stations","Temperatures","filled")

Figure contains an axes object. The axes object with xlabel Stations, ylabel Temperatures contains an object of type scatter.

Convert a string array to a categorical array. Specify that the categorical array has a set of categories that includes a value that is not present in the original array.

First, create a string array that has a set of repeated values.

A = ["red" "blue" "blue" "blue" "blue" "red"]

A = 1×6 string "red" "blue" "blue" "blue" "blue" "red"

Convert the string array to a categorical array. Specify its categories. Include green as a category.

valueset = ["blue" "red" "green"]; B = categorical(A,valueset)

B = 1×6 categorical red blue blue blue blue red

Display the categories of the categorical array. It has a category that did not come from the input string array.

ans = 3×1 cell {'blue' } {'red' } {'green'}

Create a numeric array.

A = [1 3 2; 2 1 3; 3 1 2]

A = 3×3

 1     3     2
 2     1     3
 3     1     2

Convert the numeric array to a categorical array. Specify the values and the names for the categories.

B = categorical(A,[1 2 3],["red" "green" "blue"])

B = 3×3 categorical red blue green green red blue
blue red green

Display the categories.

ans = 3×1 cell {'red' } {'green'} {'blue' }

B is not an ordinal categorical array. Therefore, you can compare the values in B only using the equality operators, == and ~=.

Find the elements that belong to the category red. Access those elements using logical indexing.

TF = 3×3 logical array

1 0 0 0 1 0 0 1 0

ans = 3×1 categorical red red red

By default, the categorical function converts missing values (such as NaNs, NaTs, empty strings, and missing strings) into undefined categorical values. However, when you call categorical you can specify a category for missing values to belong to.

For example, create a string array that includes an empty string and a missing string.

A = ["hi" "lo" missing "" "lo" "lo" "hi"]

A = 1×7 string "hi" "lo" "" "lo" "lo" "hi"

First, convert the string array to a categorical array with undefined elements.

C = 1×7 categorical hi lo lo lo hi

ans = 2×1 cell {'hi'} {'lo'}

Then, convert it again. But this time specify INDEF as the category for missing strings.

C = categorical(A,["lo" "hi" missing],["lo" "hi" "INDEF"])

C = 1×7 categorical hi lo INDEF lo lo hi

ans = 3×1 cell {'lo' } {'hi' } {'INDEF'}

Specify INDEF as the category for both missing and empty strings.

C = categorical(A,["lo" "hi" missing ""],["lo" "hi" "INDEF" "INDEF"])

C = 1×7 categorical hi lo INDEF INDEF lo lo hi

ans = 3×1 cell {'lo' } {'hi' } {'INDEF'}

Create a 5-by-2 numeric array.

A = [3 2;3 3;3 2;2 1;3 2]

A = 5×2

 3     2
 3     3
 3     2
 2     1
 3     2

Convert A to an ordinal categorical array where 1, 2, and 3 represent the categories child, adult, and senior respectively.

valueset = [1 2 3]; catnames = ["child" "adult" "senior"]; B = categorical(A,valueset,catnames,Ordinal=true)

B = 5×2 categorical senior adult
senior senior senior adult
adult child
senior adult

Because B is ordinal, the categories of B have a mathematical ordering, child < adult < senior. You can use all relational operators with ordinal categorical values. For example, return the elements that have a value greater than adult.

TF = 5×2 logical array

1 0 1 1 1 0 0 0 1 0

ans = 5×1 categorical senior senior senior senior senior

You can preallocate a categorical array of any size by creating an array of NaNs and converting it to a categorical array. After you preallocate the array, you can initialize its categories by specifying category names and adding the categories to the array.

First create an array of NaNs. You can create an array having any size. For example, create a 2-by-4 array of NaNs.

A = 2×4

NaN NaN NaN NaN NaN NaN NaN NaN

Then preallocate a categorical array by converting the array of NaNs. The categorical function converts NaNs to undefined categorical values. Just as a NaN represents "not a number", <undefined> represents a categorical value that does not belong to a category.

A = 2×4 categorical

In fact, at this point A has no categories.

ans =

0×0 empty cell array

To initialize the categories of A, specify category names and add them to A by using the addcats function. For example, add small, medium, and large as three categories of A.

A = addcats(A,["small" "medium" "large"])

A = 2×4 categorical

While the elements of A are undefined values, the categories have been initialized by addcats.

ans = 3×1 cell {'small' } {'medium'} {'large' }

Now that A has categories, you can assign defined categorical values as elements of A.

A(1) = "medium"; A(8) = "small"; A(3:5) = "large"

A = 2×4 categorical medium large large large small

The discretize function is recommended for creating categories out of continuous data, particularly when there are input values that are closely spaced. Two values are closely spaced when the difference between them is less than about 5e-5. When values are closely spaced, the categorical function cannot create unique category names from the values.

Create a numeric array with 100 random numbers.

X = 100×1

0.8147
0.9058
0.1270
0.9134
0.6324
0.0975
0.2785
0.5469
0.9575
0.9649
0.1576
0.9706
0.9572
0.4854
0.8003
  ⋮

To bin the numbers into three categories, use discretize. Specify bin boundaries and category names for the bins.

C = discretize(X,[0 .25 .75 1],"categorical",["small" "medium" "large"])

C = 100×1 categorical large large small large medium small medium medium large large small large large medium large small medium large large large medium small large large medium large medium medium medium small ⋮

Plot a histogram of the three categories of data.

Figure contains an axes object. The axes object contains an object of type categoricalhistogram.

When you multiply two categorical arrays, the result is a categorical array with a set of new categories. The new categories are all the ordered pairs created from the categories of the two original categorical arrays. This set of all possible combinations of categories is also known as the Cartesian product of the two original sets of categories.

For example, create two categorical arrays. These arrays list blood groups and Rh factors for six patients.

bloodGroups = categorical(["A" "AB" "O" "O" "A" "A"], ... ["A" "B" "AB" "O"])

bloodGroups = 1×6 categorical A AB O O A A

Rhfactors = categorical(["+" "+" "-" "-" "+" "+"])

Rhfactors = 1×6 categorical + + - - + +

Display the categories of the two arrays. While the two categorical arrays have the same numbers of elements, they can have different numbers of categories.

ans = 4×1 cell {'A' } {'B' } {'AB'} {'O' }

ans = 2×1 cell {'+'} {'-'}

Multiply the two categorical arrays. The elements of the product come from combinations of the corresponding elements from the input arrays.

bloodTypes = bloodGroups .* Rhfactors

bloodTypes = 1×6 categorical A + AB + O - O - A + A +

However, the categories of the product are all the ordered pairs that can be created from the categories of the two arrays. So, it is possible that some categories are not represented by any element of the output array.

ans = 8×1 cell {'A +' } {'A -' } {'B +' } {'B -' } {'AB +'} {'AB -'} {'O +' } {'O -' }

Limitations

Tips

Extended Capabilities

expand all

Thecategorical function supports tall arrays with the following usage notes and limitations:

For more information, see Tall Arrays.

Usage notes and limitations:

For more information, see Run MATLAB Functions with Distributed Arrays (Parallel Computing Toolbox).

Version History

Introduced in R2013b