grouptransform - Transform by group - MATLAB (original) (raw)

Syntax

Description

Table Data

[G](#mw%5Fd3140eec-a825-4ecc-a133-9e7a36c3aa14) = grouptransform([T](#d126e704109),[groupvars](#mw%5F3a9726a1-e378-41dd-80fe-7ed79b07e1d7),[method](#d126e704333)) returns transformed data in the place of the nongrouping variables in table or timetable T. The group-wise computations inmethod are applied to each nongrouping variable. Groups are defined by rows in the variables in groupvars that have the same unique combination of values. For example, G = grouptransform(T,"HealthStatus","norm") normalizes the data inT by health status using the 2-norm.

You can use grouptransform functionality interactively by adding the Compute by Group task to a live script.

[G](#mw%5Fd3140eec-a825-4ecc-a133-9e7a36c3aa14) = grouptransform([T](#d126e704109),[groupvars](#mw%5F3a9726a1-e378-41dd-80fe-7ed79b07e1d7),[groupbins](#mw%5F9fc4f383-c4a3-4f4b-9982-dd49e8a5942a),[method](#d126e704333)) specifies to bin rows in groupvars according to binning scheme groupbins prior to grouping and appends the bins to the output table as additional variables. For example, G = grouptransform(T,"SaleDate","year","rescale") bins by sale year and scales the data in T to the range [0, 1].

example

[G](#mw%5Fd3140eec-a825-4ecc-a133-9e7a36c3aa14) = grouptransform(___,[datavars](#d126e704697)) specifies the table variables to apply the method to for either of the previous syntaxes.

[G](#mw%5Fd3140eec-a825-4ecc-a133-9e7a36c3aa14) = grouptransform(___,[Name,Value](#namevaluepairarguments)) specifies additional properties using one or more name-value arguments for any of the previous syntaxes. For example, G = grouptransform(T,"Temp","linearfill","ReplaceValues",false) appends the filled data as an additional variable of T instead of replacing the nongrouping variables.

example

Array Data

[B](#mw%5Fa6fee2b7-69df-43cb-9a49-ff4ed5c3ceaf) = grouptransform([A](#mw%5F62fed2ce-b5ab-4fba-8490-b79a144690a8),[groupvars](#mw%5F3a9726a1-e378-41dd-80fe-7ed79b07e1d7),[method](#d126e704333)) returns transformed data in the place of column vectors in the input vector or matrix A. The group-wise computations inmethod are applied to all column vectors inA. Groups are defined by rows in the column vectors ingroupvars that have the same unique combination of values.

You can use grouptransform functionality interactively by adding the Compute by Group task to a live script.

[B](#mw%5Fa6fee2b7-69df-43cb-9a49-ff4ed5c3ceaf) = grouptransform([A](#mw%5F62fed2ce-b5ab-4fba-8490-b79a144690a8),[groupvars](#mw%5F3a9726a1-e378-41dd-80fe-7ed79b07e1d7),[groupbins](#mw%5F9fc4f383-c4a3-4f4b-9982-dd49e8a5942a),[method](#d126e704333)) specifies to bin rows in groupvars according to binning scheme groupbins prior to grouping.

example

[B](#mw%5Fa6fee2b7-69df-43cb-9a49-ff4ed5c3ceaf) = grouptransform(___,[Name,Value](#namevaluepairarguments)) specifies additional properties using one or more name-value arguments for either of the previous syntaxes for an input array.

[[B](#mw%5Fa6fee2b7-69df-43cb-9a49-ff4ed5c3ceaf),[BG](#mw%5F4512538e-ac1c-4351-b63a-6ddeb685d720)] = grouptransform([A](#mw%5F62fed2ce-b5ab-4fba-8490-b79a144690a8),___) also returns the values of the grouping vectors or binned grouping vectors corresponding to the rows in B.

example

Examples

collapse all

Create a timetable that contains a progress status for three teams.

timeStamp = days([1 1 1 2 2 2 3 3 3]'); teamNumber = [1 2 3 1 2 3 1 2 3]'; percentComplete = [14.2 28.1 11.5 NaN NaN 19.3 46.1 51.2 30.3]'; T = timetable(timeStamp,teamNumber,percentComplete)

T=9×2 timetable timeStamp teamNumber percentComplete _________ __________ _______________

1 day            1              14.2      
1 day            2              28.1      
1 day            3              11.5      
2 days           1               NaN      
2 days           2               NaN      
2 days           3              19.3      
3 days           1              46.1      
3 days           2              51.2      
3 days           3              30.3      

Fill missing status percentages, indicated with NaN, for each team using linear interpolation.

G = grouptransform(T,"teamNumber","linearfill","percentComplete")

G=9×2 timetable timeStamp teamNumber percentComplete _________ __________ _______________

1 day            1               14.2     
1 day            2               28.1     
1 day            3               11.5     
2 days           1              30.15     
2 days           2              39.65     
2 days           3               19.3     
3 days           1               46.1     
3 days           2               51.2     
3 days           3               30.3     

To append the filled data to the original table instead of replacing the percentComplete variable, use ReplaceValues.

Gappend = grouptransform(T,"teamNumber","linearfill","percentComplete","ReplaceValues",false)

Gappend=9×3 timetable timeStamp teamNumber percentComplete linearfill_percentComplete _________ __________ _______________ __________________________

1 day            1              14.2                     14.2           
1 day            2              28.1                     28.1           
1 day            3              11.5                     11.5           
2 days           1               NaN                    30.15           
2 days           2               NaN                    39.65           
2 days           3              19.3                     19.3           
3 days           1              46.1                     46.1           
3 days           2              51.2                     51.2           
3 days           3              30.3                     30.3           

Create a table of dates and corresponding profits.

timeStamps = datetime([2017 3 4; 2017 3 2; 2017 3 15; 2017 3 10; ... 2017 3 14; 2017 3 31; 2017 3 25; ... 2017 3 29; 2017 3 21; 2017 3 18]); profit = [2032 3071 1185 2587 1998 2899 3112 909 2619 3085]'; T = table(timeStamps,profit)

T=10×2 table timeStamps profit ___________ ______

04-Mar-2017     2032 
02-Mar-2017     3071 
15-Mar-2017     1185 
10-Mar-2017     2587 
14-Mar-2017     1998 
31-Mar-2017     2899 
25-Mar-2017     3112 
29-Mar-2017      909 
21-Mar-2017     2619 
18-Mar-2017     3085 

Binning by day name, normalize the profits using the 2-norm.

G = grouptransform(T,"timeStamps","dayname","norm")

G=10×3 table timeStamps profit dayname_timeStamps ___________ _______ __________________

04-Mar-2017    0.42069        Saturday      
02-Mar-2017          1        Thursday      
15-Mar-2017    0.79344        Wednesday     
10-Mar-2017    0.66582        Friday        
14-Mar-2017    0.60654        Tuesday       
31-Mar-2017    0.74612        Friday        
25-Mar-2017    0.64428        Saturday      
29-Mar-2017    0.60864        Wednesday     
21-Mar-2017    0.79506        Tuesday       
18-Mar-2017    0.63869        Saturday      

Create a vector of dates and a vector of corresponding profit values.

timeStamps = datetime([2017 3 4; 2017 3 2; 2017 3 15; 2017 3 10; ... 2017 3 14; 2017 3 31; 2017 3 25; ... 2017 3 29; 2017 3 21; 2017 3 18]); profit = [2032 3071 1185 2587 1998 2899 3112 909 2619 3085]';

Binning by day name, normalize the profits using the 2-norm. Display the transformed data and which group it corresponds to.

[normDailyProfit,dayName] = grouptransform(profit,timeStamps,"dayname","norm")

normDailyProfit = 10×1

0.4207
1.0000
0.7934
0.6658
0.6065
0.7461
0.6443
0.6086
0.7951
0.6387

dayName = 10×1 categorical Saturday Thursday Wednesday Friday Tuesday Friday Saturday Wednesday Tuesday Saturday

Input Arguments

collapse all

Input table, specified as a table or timetable.

Input array, specified as a column vector or a group of column vectors stored as a matrix.

Grouping variables or vectors, specified as one of these options:

Example: grouptransform(T,"Var3",method)

Transformation method, specified as one of these values:

Method Description
"zscore" Normalize data to have mean 0 and standard deviation 1
"norm" Normalize data by 2-norm
"meancenter" Normalize data to have mean 0
"rescale" Rescale range to [0,1]
"meanfill" Fill missing values with the mean of the group data
"linearfill" Fill missing values by linear interpolation of nonmissing group data

You can also specify a function handle that returns one array whose first dimension has length 1 or has the same number of rows as the input data. If the function returns an array with first dimension length equal to 1, thengrouptransform repeats that value so that the output has the same number of rows as the input.

Data Types: char | string | function_handle

Binning scheme for grouping variables or vectors, specified as one or more of the following binning methods. Grouping variables or vectors and binning scheme arguments must be the same size, or one of them can be scalar.

Table variables to operate on, specified as one of the options in this table. datavars indicates which variables of the input table or timetable to apply the methods to. Other variables not specified bydatavars pass through to the output without being operated on. When datavars is not specified,grouptransform operates on each nongrouping variable.

Indexing Scheme Values to Specify Examples
Variable name A string scalar or character vectorA string array or cell array of character vectorsA pattern object "A" or 'A' — A variable named A["A" "B"] or {'A','B'} — Two variables named A andB"Var"+digitsPattern(1) — Variables named"Var" followed by a single digit
Variable index An index number that refers to the location of a variable in the tableA vector of numbersA logical vector. Typically, this vector is the same length as the number of variables, but you can omit trailing0 (false) values. 3 — The third variable from the table[2 3] — The second and third variables from the table[false false true] — The third variable
Function handle A function handle that takes a table variable as input and returns a logical scalar @isnumeric — All the variables containing numeric values
Variable type A vartype subscript that selects variables of a specified type vartype("numeric") — All the variables containing numeric values

Example: grouptransform(T,groupvars,method,["Var1" "Var2" "Var4"])

Name-Value Arguments

collapse all

Specify optional pairs of arguments asName1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter.

Example: G = grouptransform(T,groupvars,groupbins,"zscore",IncludedEdge="right")

Before R2021a, use commas to separate each name and value, and enclose Name in quotes.

Example: G = grouptransform(T,groupvars,groupbins,"zscore","IncludedEdge","right")

Included bin edge for binning scheme, specified as either "left" or"right", indicating which end of the bin interval is inclusive.

You can specify IncludedEdge only if you also specifygroupbins, and the value applies to all binning methods for all grouping variables or vectors.

Option to replace values, specified as one of these values:

Output Arguments

collapse all

Output table for table or timetable input data, returned as a table or timetable. G contains the transformed data for each group.

Output array for array input data, returned as a vector or matrix.B contains the transformed data in the place of the nongrouping vectors.

Grouping vectors for array input data, returned as a column vector or cell array of column vectors. BG contains the unique grouping vector or binned grouping vector combinations that correspond to the rows inB.

Tips

Alternative Functionality

Live Editor Task

You can use grouptransform functionality interactively by adding the Compute by Group task to a live script.

Compute by Group task in the Live Editor

Extended Capabilities

expand all

Thegrouptransform function supports tall arrays with the following usage notes and limitations:

For more information, see Tall Arrays.

Usage notes and limitations:

Version History

Introduced in R2018b

expand all

Apply multiple binning methods to one grouping variable or vector by specifying a cell array of binning methods.

Generate C or C++ code for the grouptransform function. For usage notes and limitations, see C/C++ Code Generation.

The grouptransform function shows improved performance, especially when the data count in each group is small.

For example, this code transforms by group a matrix with 500 groups with a count of 10 each. The code is about 6.20x faster than in the previous release.

function timingGrouptransform data = (1:5000)'; groups = repelem(1:length(data)/10,10)'; p = randperm(length(data)); data = data(p); groups = groups(p);

tic for k = 1:290 G = grouptransform(data,groups,"norm"); end toc end

The approximate execution times are:

R2021b: 6.26 s

R2022a: 1.01 s

The code was timed on a Windows® 10, Intel® Xeon® CPU E5-1650 v4 @ 3.60 GHz test system by calling thetimingGrouptransform function.