Create Categorical Arrays - MATLAB & Simulink (original) (raw)

This example shows how to create categorical arrays from various types of input data and modify their elements. The categorical data type stores values from a finite set of discrete categories. You can create a categorical array from a numeric array, logical array, string array, or cell array of character vectors. The unique values from the input array become the categories of the categorical array. A categorical array provides efficient storage and convenient manipulation of data while also maintaining meaningful names for the values.

By default, the categories of a categorical array do not have a mathematical ordering. For example, the discrete set of pet categories ["dog" "cat" "bird"] has no meaningful mathematical ordering, so MATLAB® uses the alphabetical ordering ["bird" "cat" "dog"]. But you can also create ordinal categorical arrays, in which the categories do have meaningful mathematical orderings. For example, the discrete set of size categories ["small" "medium" "large"] can have the mathematical ordering of small < medium < large. Ordinal categorical arrays enable you to make comparisons between their elements.

Create Categorical Array from Input Array

To create a categorical array from an input array, use the categorical function.

For example, create a string array whose elements are all states from New England. Notice that some of the strings have leading and trailing spaces.

statesNE = ["MA" "ME" " CT" "VT" " ME " "NH" "VT" "MA" "NH" "CT" "RI "]

statesNE = 1×11 string "MA" "ME" " CT" "VT" " ME " "NH" "VT" "MA" "NH" "CT" "RI "

Convert the string array to a categorical array. When you create categorical arrays from string arrays (or cell arrays of character vectors), leading and trailing spaces are removed.

statesNE = categorical(statesNE)

statesNE = 1×11 categorical MA ME CT VT ME NH VT MA NH CT RI

List the categories of statesNE by using the categories function. Every element of statesNE belongs to one of these categories. Because statesNE has six unique states, there are six categories. The categories are listed in alphabetical order because the state abbreviations have no mathematical ordering.

ans = 6×1 cell {'CT'} {'MA'} {'ME'} {'NH'} {'RI'} {'VT'}

Add and Modify Elements

To add one element to a categorical array, you can assign text that represents a category name. For example, add a state to statesNE.

statesNE = 1×12 categorical MA ME CT VT ME NH VT MA NH CT RI ME

To add or modify multiple elements, you must assign a categorical array.

statesNE(1:3) = categorical(["RI" "VT" "MA"])

statesNE = 1×12 categorical RI VT MA VT ME NH VT MA NH CT RI ME

Add Missing Values as Undefined Elements

You can assign missing values as undefined elements of a categorical array. An undefined categorical value does not belong to any category, similar to NaN (Not-a-Number) in numeric arrays.

To assign missing values, use the missing function. For example, modify the first element of the categorical array to be a missing value.

statesNE = 1×12 categorical VT MA VT ME NH VT MA NH CT RI ME

Assign two missing values at the end of the categorical array.

statesNE(12:13) = [missing missing]

statesNE = 1×13 categorical VT MA VT ME NH VT MA NH CT RI

If you convert a string array to a categorical array, then missing strings and empty strings become undefined elements in the categorical array. If you convert a numeric array, then NaNs become undefined elements. Therefore, assigning missing strings, "", '', or NaNs to elements of a categorical array converts them to undefined categorical values.

statesNE = 1×13 categorical MA VT ME NH VT MA NH CT RI

Create Ordinal Categorical Array from String Array

In an ordinal categorical array, the order of the categories defines a mathematical order that enables comparisons. Because of this mathematical order, you can compare elements of an ordinal categorical array using relational operators. You cannot compare elements of categorical arrays that are not ordinal.

For example, create a string array that contains the sizes of eight objects.

AllSizes = ["medium" "large" "small" "small" "medium" ... "large" "medium" "small"];

The string array has three unique values: "large", "medium", and "small". A string array has no convenient way to indicate that small < medium < large.

Convert the string array to an ordinal categorical array. Define the categories as small, medium, and large, in that order. For an ordinal categorical array, the first category specified is the smallest and the last category is the largest.

valueset = ["small" "medium" "large"]; sizeOrd = categorical(AllSizes,valueset,"Ordinal",true)

sizeOrd = 1×8 categorical medium large small small medium large medium small

The order of the values in the categorical array, sizeOrd, remains unchanged.

List the discrete categories in sizeOrd. The order of the categories matches their mathematical ordering small < medium < large.

ans = 3×1 cell {'small' } {'medium'} {'large' }

Create Ordinal Categorical Array by Binning Numeric Data

If you have an array with continuous numeric data, specifying numeric ranges as categories can be useful. In such cases, bin the data using the discretize function. Assign category names to the bins.

For example, create a vector of 100 random numbers between 0 and 50.

x = 100×1

40.7362 45.2896 6.3493 45.6688 31.6180 4.8770 13.9249 27.3441 47.8753 48.2444 7.8807 48.5296 47.8583 24.2688 40.0140 ⋮

Use discretize to create a categorical array by binning the values of x. Put all the values between 0 and 15 in the first bin, all the values between 15 and 35 in the second bin, and all the values between 35 and 50 in the third bin. Each bin includes the left endpoint but does not include the right endpoint, except the last bin.

catnames = ["small" "medium" "large"]; binnedData = discretize(x,[0 15 35 50],"categorical",catnames)

binnedData = 100×1 categorical large large small large medium small small medium large large small large large medium large small medium large large large medium small large large medium large large medium medium small ⋮

binnedData is an ordinal categorical array with three categories, such that small < medium < large.

To display the number of elements in each category, use the summary function.

binnedData: 100×1 ordinal categorical

 small            30 
 medium           35 
 large            35 
 <undefined>       0 

Additional statistics:

Min         small   
Median      medium  
Max         large   

You can make various kinds of charts of the binned data. For example, make a pie chart of binnedData.

Figure contains an axes object. The hidden axes object contains 6 objects of type patch, text. These objects represent small, medium, large.

Preallocate Categorical Array

You can preallocate a categorical array of any size by creating an array of NaNs and converting it to a categorical array. After you preallocate the array, you can initialize its categories by adding the category names to the array.

For example, create a 2-by-4 array of NaNs.

A = 2×4

NaN NaN NaN NaN NaN NaN NaN NaN

Then convert the array of NaNs to a categorical array of undefined categorical values.

A = 2×4 categorical

At this point, A has no categories.

ans =

0×0 empty cell array

Add small, medium, and large categories to A by using the addcats function.

A = addcats(A,["small" "medium" "large"])

A = 2×4 categorical

While the elements of A are still undefined values, the categories of A are defined.

ans = 3×1 cell {'small' } {'medium'} {'large' }

Now that A has categories, you can assign defined categorical values as elements of A.

A(1) = "medium"; A(8) = "small"; A(3:5) = "large"

A = 2×4 categorical medium large large large small

See Also

categorical | categories | discretize | summary | addcats | missing

Topics