grouptransform - Transform by group - MATLAB (original) (raw)
Syntax
Description
Table Data
[G](#mw%5Fd3140eec-a825-4ecc-a133-9e7a36c3aa14) = grouptransform([T](#d126e704109),[groupvars](#mw%5F3a9726a1-e378-41dd-80fe-7ed79b07e1d7),[method](#d126e704333))
returns transformed data in the place of the nongrouping variables in table or timetable T
. The group-wise computations inmethod
are applied to each nongrouping variable. Groups are defined by rows in the variables in groupvars
that have the same unique combination of values. For example, G = grouptransform(T,"HealthStatus","norm")
normalizes the data inT
by health status using the 2-norm.
You can use grouptransform
functionality interactively by adding the Compute by Group task to a live script.
[G](#mw%5Fd3140eec-a825-4ecc-a133-9e7a36c3aa14) = grouptransform([T](#d126e704109),[groupvars](#mw%5F3a9726a1-e378-41dd-80fe-7ed79b07e1d7),[groupbins](#mw%5F9fc4f383-c4a3-4f4b-9982-dd49e8a5942a),[method](#d126e704333))
specifies to bin rows in groupvars
according to binning scheme groupbins
prior to grouping and appends the bins to the output table as additional variables. For example, G = grouptransform(T,"SaleDate","year","rescale")
bins by sale year and scales the data in T
to the range [0, 1].
[G](#mw%5Fd3140eec-a825-4ecc-a133-9e7a36c3aa14) = grouptransform(___,[datavars](#d126e704697))
specifies the table variables to apply the method to for either of the previous syntaxes.
[G](#mw%5Fd3140eec-a825-4ecc-a133-9e7a36c3aa14) = grouptransform(___,[Name,Value](#namevaluepairarguments))
specifies additional properties using one or more name-value arguments for any of the previous syntaxes. For example, G = grouptransform(T,"Temp","linearfill","ReplaceValues",false)
appends the filled data as an additional variable of T
instead of replacing the nongrouping variables.
Array Data
[B](#mw%5Fa6fee2b7-69df-43cb-9a49-ff4ed5c3ceaf) = grouptransform([A](#mw%5F62fed2ce-b5ab-4fba-8490-b79a144690a8),[groupvars](#mw%5F3a9726a1-e378-41dd-80fe-7ed79b07e1d7),[method](#d126e704333))
returns transformed data in the place of column vectors in the input vector or matrix A
. The group-wise computations inmethod
are applied to all column vectors inA
. Groups are defined by rows in the column vectors ingroupvars
that have the same unique combination of values.
You can use grouptransform
functionality interactively by adding the Compute by Group task to a live script.
[B](#mw%5Fa6fee2b7-69df-43cb-9a49-ff4ed5c3ceaf) = grouptransform([A](#mw%5F62fed2ce-b5ab-4fba-8490-b79a144690a8),[groupvars](#mw%5F3a9726a1-e378-41dd-80fe-7ed79b07e1d7),[groupbins](#mw%5F9fc4f383-c4a3-4f4b-9982-dd49e8a5942a),[method](#d126e704333))
specifies to bin rows in groupvars
according to binning scheme groupbins
prior to grouping.
[B](#mw%5Fa6fee2b7-69df-43cb-9a49-ff4ed5c3ceaf) = grouptransform(___,[Name,Value](#namevaluepairarguments))
specifies additional properties using one or more name-value arguments for either of the previous syntaxes for an input array.
[[B](#mw%5Fa6fee2b7-69df-43cb-9a49-ff4ed5c3ceaf),[BG](#mw%5F4512538e-ac1c-4351-b63a-6ddeb685d720)] = grouptransform([A](#mw%5F62fed2ce-b5ab-4fba-8490-b79a144690a8),___)
also returns the values of the grouping vectors or binned grouping vectors corresponding to the rows in B
.
Examples
Create a timetable that contains a progress status for three teams.
timeStamp = days([1 1 1 2 2 2 3 3 3]'); teamNumber = [1 2 3 1 2 3 1 2 3]'; percentComplete = [14.2 28.1 11.5 NaN NaN 19.3 46.1 51.2 30.3]'; T = timetable(timeStamp,teamNumber,percentComplete)
T=9×2 timetable timeStamp teamNumber percentComplete _________ __________ _______________
1 day 1 14.2
1 day 2 28.1
1 day 3 11.5
2 days 1 NaN
2 days 2 NaN
2 days 3 19.3
3 days 1 46.1
3 days 2 51.2
3 days 3 30.3
Fill missing status percentages, indicated with NaN
, for each team using linear interpolation.
G = grouptransform(T,"teamNumber","linearfill","percentComplete")
G=9×2 timetable timeStamp teamNumber percentComplete _________ __________ _______________
1 day 1 14.2
1 day 2 28.1
1 day 3 11.5
2 days 1 30.15
2 days 2 39.65
2 days 3 19.3
3 days 1 46.1
3 days 2 51.2
3 days 3 30.3
To append the filled data to the original table instead of replacing the percentComplete
variable, use ReplaceValues
.
Gappend = grouptransform(T,"teamNumber","linearfill","percentComplete","ReplaceValues",false)
Gappend=9×3 timetable timeStamp teamNumber percentComplete linearfill_percentComplete _________ __________ _______________ __________________________
1 day 1 14.2 14.2
1 day 2 28.1 28.1
1 day 3 11.5 11.5
2 days 1 NaN 30.15
2 days 2 NaN 39.65
2 days 3 19.3 19.3
3 days 1 46.1 46.1
3 days 2 51.2 51.2
3 days 3 30.3 30.3
Create a table of dates and corresponding profits.
timeStamps = datetime([2017 3 4; 2017 3 2; 2017 3 15; 2017 3 10; ... 2017 3 14; 2017 3 31; 2017 3 25; ... 2017 3 29; 2017 3 21; 2017 3 18]); profit = [2032 3071 1185 2587 1998 2899 3112 909 2619 3085]'; T = table(timeStamps,profit)
T=10×2 table timeStamps profit ___________ ______
04-Mar-2017 2032
02-Mar-2017 3071
15-Mar-2017 1185
10-Mar-2017 2587
14-Mar-2017 1998
31-Mar-2017 2899
25-Mar-2017 3112
29-Mar-2017 909
21-Mar-2017 2619
18-Mar-2017 3085
Binning by day name, normalize the profits using the 2-norm.
G = grouptransform(T,"timeStamps","dayname","norm")
G=10×3 table timeStamps profit dayname_timeStamps ___________ _______ __________________
04-Mar-2017 0.42069 Saturday
02-Mar-2017 1 Thursday
15-Mar-2017 0.79344 Wednesday
10-Mar-2017 0.66582 Friday
14-Mar-2017 0.60654 Tuesday
31-Mar-2017 0.74612 Friday
25-Mar-2017 0.64428 Saturday
29-Mar-2017 0.60864 Wednesday
21-Mar-2017 0.79506 Tuesday
18-Mar-2017 0.63869 Saturday
Create a vector of dates and a vector of corresponding profit values.
timeStamps = datetime([2017 3 4; 2017 3 2; 2017 3 15; 2017 3 10; ... 2017 3 14; 2017 3 31; 2017 3 25; ... 2017 3 29; 2017 3 21; 2017 3 18]); profit = [2032 3071 1185 2587 1998 2899 3112 909 2619 3085]';
Binning by day name, normalize the profits using the 2-norm. Display the transformed data and which group it corresponds to.
[normDailyProfit,dayName] = grouptransform(profit,timeStamps,"dayname","norm")
normDailyProfit = 10×1
0.4207
1.0000
0.7934
0.6658
0.6065
0.7461
0.6443
0.6086
0.7951
0.6387
dayName = 10×1 categorical Saturday Thursday Wednesday Friday Tuesday Friday Saturday Wednesday Tuesday Saturday
Input Arguments
Input table, specified as a table or timetable.
Input array, specified as a column vector or a group of column vectors stored as a matrix.
Grouping variables or vectors, specified as one of these options:
- For array input data,
groupvars
can be either a column vector with the same number of rows asA
or a group of column vectors arranged in a matrix or cell array. - For table or timetable input data,
groupvars
indicates which variables to use to compute groups in the data. You can specify the grouping variables with any of the options in this table.Indexing Scheme Values to Specify Examples Variable name A string scalar or character vectorA string array or cell array of character vectorsA pattern object "A" or 'A' — A variable named A["A" "B"] or {'A','B'} — Two variables named A andB"Var"+digitsPattern(1) — Variables named"Var" followed by a single digit Variable index An index number that refers to the location of a variable in the tableA vector of numbersA logical vector. Typically, this vector is the same length as the number of variables, but you can omit trailing0 (false) values. 3 — The third variable from the table[2 3] — The second and third variables from the table[false false true] — The third variable Function handle A function handle that takes a table variable as input and returns a logical scalar @isnumeric — All the variables containing numeric values Variable type A vartype subscript that selects variables of a specified type vartype("numeric") — All the variables containing numeric values
Example: grouptransform(T,"Var3",method)
Transformation method, specified as one of these values:
Method | Description |
---|---|
"zscore" | Normalize data to have mean 0 and standard deviation 1 |
"norm" | Normalize data by 2-norm |
"meancenter" | Normalize data to have mean 0 |
"rescale" | Rescale range to [0,1] |
"meanfill" | Fill missing values with the mean of the group data |
"linearfill" | Fill missing values by linear interpolation of nonmissing group data |
You can also specify a function handle that returns one array whose first dimension has length 1 or has the same number of rows as the input data. If the function returns an array with first dimension length equal to 1, thengrouptransform
repeats that value so that the output has the same number of rows as the input.
Data Types: char
| string
| function_handle
Binning scheme for grouping variables or vectors, specified as one or more of the following binning methods. Grouping variables or vectors and binning scheme arguments must be the same size, or one of them can be scalar.
"none"
— No binning.- Vector of bin edges — The bin edges define the bins. You can specify the edges as numeric values or as
datetime
values fordatetime
grouping variables or vectors. - Number of bins — The number determines how many equally spaced bins to create. You can specify the number of bins as a positive integer scalar.
- Length of time (bin width) — The length of time determines the width of each bin. You can specify the bin width as a
duration
orcalendarDuration
scalar fordatetime
orduration
grouping variables or vectors. - Name of time unit (bin width) — The name of the time unit determines the width of each bin. You can specify the bin width as one of the options in this table for
datetime
orduration
grouping variables or vectors.Value Description Data Type "second" Each bin is 1 second. datetime and duration "minute" Each bin is 1 minute. datetime and duration "hour" Each bin is 1 hour. datetime and duration "day" Each bin is 1 calendar day. This value accounts for daylight saving time shifts. datetime and duration "week" Each bin is 1 calendar week. datetime only "month" Each bin is 1 calendar month. datetime only "quarter" Each bin is 1 calendar quarter. datetime only "year" Each bin is 1 calendar year. This value accounts for leap days. datetime and duration "decade" Each bin is 1 decade (10 calendar years). datetime only "century" Each bin is 1 century (100 calendar years). datetime only "secondofminute" Bins are seconds from 0 to 59. datetime only "minuteofhour" Bins are minutes from 0 to 59. datetime only "hourofday" Bins are hours from 0 to 23. datetime only "dayofweek" Bins are days from 1 to 7. The first day of the week is Sunday. datetime only "dayname" Bins are full day names, such as "Sunday". datetime only "dayofmonth" Bins are days from 1 to 31. datetime only "dayofyear" Bins are days from 1 to 366. datetime only "weekofmonth" Bins are weeks from 1 to 6. datetime only "weekofyear" Bins are weeks from 1 to 54. datetime only "monthname" Bins are full month names, such as "January". datetime only "monthofyear" Bins are months from 1 to 12. datetime only "quarterofyear" Bins are quarters from 1 to 4. datetime only
Table variables to operate on, specified as one of the options in this table. datavars
indicates which variables of the input table or timetable to apply the methods to. Other variables not specified bydatavars
pass through to the output without being operated on. When datavars
is not specified,grouptransform
operates on each nongrouping variable.
Indexing Scheme | Values to Specify | Examples |
---|---|---|
Variable name | A string scalar or character vectorA string array or cell array of character vectorsA pattern object | "A" or 'A' — A variable named A["A" "B"] or {'A','B'} — Two variables named A andB"Var"+digitsPattern(1) — Variables named"Var" followed by a single digit |
Variable index | An index number that refers to the location of a variable in the tableA vector of numbersA logical vector. Typically, this vector is the same length as the number of variables, but you can omit trailing0 (false) values. | 3 — The third variable from the table[2 3] — The second and third variables from the table[false false true] — The third variable |
Function handle | A function handle that takes a table variable as input and returns a logical scalar | @isnumeric — All the variables containing numeric values |
Variable type | A vartype subscript that selects variables of a specified type | vartype("numeric") — All the variables containing numeric values |
Example: grouptransform(T,groupvars,method,["Var1" "Var2" "Var4"])
Name-Value Arguments
Specify optional pairs of arguments asName1=Value1,...,NameN=ValueN
, where Name
is the argument name and Value
is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter.
Example: G = grouptransform(T,groupvars,groupbins,"zscore",IncludedEdge="right")
Before R2021a, use commas to separate each name and value, and enclose Name
in quotes.
Example: G = grouptransform(T,groupvars,groupbins,"zscore","IncludedEdge","right")
Included bin edge for binning scheme, specified as either "left"
or"right"
, indicating which end of the bin interval is inclusive.
You can specify IncludedEdge
only if you also specifygroupbins
, and the value applies to all binning methods for all grouping variables or vectors.
Option to replace values, specified as one of these values:
true
or1
— Replace nongrouping table variables or column vectors in the input data with table variables or column vectors containing transformed data.false
or0
— Append the input data with the table variables or column vectors containing transformed data.
Output Arguments
Output table for table or timetable input data, returned as a table or timetable. G
contains the transformed data for each group.
Output array for array input data, returned as a vector or matrix.B
contains the transformed data in the place of the nongrouping vectors.
Grouping vectors for array input data, returned as a column vector or cell array of column vectors. BG
contains the unique grouping vector or binned grouping vector combinations that correspond to the rows inB
.
Tips
- When making many calls to
grouptransform
, consider converting grouping variables to typecategorical
orlogical
when possible for improved performance. For example, if you have a string array grouping variable (such asHealthStatus
with elements"Poor"
,"Fair"
,"Good"
, and"Excellent"
), you can convert it to a categorical variable using the commandcategorical(HealthStatus)
.
Alternative Functionality
Live Editor Task
You can use grouptransform
functionality interactively by adding the Compute by Group task to a live script.
Extended Capabilities
Thegrouptransform
function supports tall arrays with the following usage notes and limitations:
- If
A
andgroupvars
are both tall matrices, then they must have the same number of rows. - If the first input is a tall matrix, then
groupvars
can be a cell array containing tall grouping vectors. - The
groupvars
anddatavars
arguments do not support function handles. - If the
method
argument is a function handle, then it must be a valid input forsplitapply
operating on a tall array. - When grouping by discretized datetime arrays, the categorical group names are different compared to in-memory
grouptransform
calculations.
For more information, see Tall Arrays.
Usage notes and limitations:
- Sparse inputs are not supported.
- Binning scheme is not supported for datetime or duration data.
- Input tables that contain multidimensional arrays are not supported.
- Computation methods must be constant.
- Grouping variables must be constant when the first input argument is a table.
- Data variables must be constant.
- Binning scheme specified as character vectors or strings must be constant.
- Name-value arguments must be constant.
- Computation methods cannot return sparse, multidimensional, or cell array results.
Version History
Introduced in R2018b
Apply multiple binning methods to one grouping variable or vector by specifying a cell array of binning methods.
Generate C or C++ code for the grouptransform
function. For usage notes and limitations, see C/C++ Code Generation.
The grouptransform
function shows improved performance, especially when the data count in each group is small.
For example, this code transforms by group a matrix with 500 groups with a count of 10 each. The code is about 6.20x faster than in the previous release.
function timingGrouptransform data = (1:5000)'; groups = repelem(1:length(data)/10,10)'; p = randperm(length(data)); data = data(p); groups = groups(p);
tic for k = 1:290 G = grouptransform(data,groups,"norm"); end toc end
The approximate execution times are:
R2021b: 6.26 s
R2022a: 1.01 s
The code was timed on a Windows® 10, Intel® Xeon® CPU E5-1650 v4 @ 3.60 GHz test system by calling thetimingGrouptransform
function.