Advantages of Using Categorical Arrays - MATLAB & Simulink (original) (raw)

Main Content

Natural Representation of Categorical Data

categorical is a data type to store data with values from a finite set of discrete categories. One common alternative to using categorical arrays is to use string arrays. However, while string arrays store text, you cannot use them to define categories. The other common alternative to using categorical arrays is to store categorical data using integers in numeric arrays. Using numeric arrays loses all the useful descriptive information from the category names, and also tends to suggest that the integer values have their usual numeric meaning, which, for categorical data, they do not.

Mathematical Ordering for Categories

Categorical arrays are convenient and memory efficient containers for nonnumeric data with values from a finite set of discrete categories. They are especially useful when the categories have a meaningful mathematical ordering, such as an array with entries from the discrete set of categories ["small" "medium" "large"] wheresmall < medium < large.

The only ordering that string arrays provide is alphanumeric order. If you use a categorical array, then you can specify any ordering that makes sense for your set of categories. You can use relational operations to test for equality and perform elementwise comparisons that have a meaningful mathematical ordering.

Reduce Memory Requirements

This example shows how to compare the memory required to store data as a string array to the memory required for a categorical array. String arrays must store each element even when they have many repeated values. Categorical arrays store only one copy of each category name, often reducing the amount of memory required to store an array when it has many repeated values.

Create a sample string array.

state = [repmat("MA",25,1);repmat("NY",25,1); ... repmat("CA",50,1); ... repmat("MA",25,1);repmat("NY",25,1)];

Display information about the variable state.

Name Size Bytes Class Attributes

state 150x1 8212 string

Convert state to a categorical array.

stateCats = categorical(state);

Display the discrete categories in the variable stateCats.

ans = 3×1 cell {'CA'} {'MA'} {'NY'}

stateCats contains 150 elements, but only three distinct categories.

Display information about the two variables. There is a significant reduction in the memory required to store the categorical array.

Name Size Bytes Class Attributes

state 150x1 8212 string
stateCats 150x1 524 categorical

See Also

categorical | categories

Topics