groupfilter - Filter by group - MATLAB (original) (raw)
Syntax
Description
Table Data
[G](#mw%5F477c47e6-0078-4d67-8832-15ee7db221e1) = groupfilter([T](#mw%5Fd9249256-8fcd-4f6e-bf96-cdabab636c36),[groupvars](#mw%5F82255d84-bc75-44d6-908d-e1b19d1d57c5),[method](#mw%5F092baeaa-975d-4212-a576-dac2859d8d01))
returns the rows of table or timetable T
that satisfy the group-wise filtering condition specified in method
. The filtering conditionmethod
is a function handle applied to each nongrouping variable. Groups are defined by rows in the variables in groupvars
that have the same unique combination of values. For example, G = groupfilter(T,"Trial",@(x) numel(x) > 5)
groups the data in T
byTrial
, and keeps the rows that belong to groups with more than five trials.
You can use groupfilter
functionality interactively by adding the Compute by Group task to a live script.
[G](#mw%5F477c47e6-0078-4d67-8832-15ee7db221e1) = groupfilter([T](#mw%5Fd9249256-8fcd-4f6e-bf96-cdabab636c36),[groupvars](#mw%5F82255d84-bc75-44d6-908d-e1b19d1d57c5),[groupbins](#mw%5Fec069d74-7335-4d26-b2f7-97bf43a2c87d),[method](#mw%5F092baeaa-975d-4212-a576-dac2859d8d01))
specifies to bin rows in groupvars
according to binning schemegroupbins
prior to grouping. For example, G = groupfilter(T,"SaleDate","year",@(x) numel(x) > 5)
bins the data inSaleDate
by year, and keeps the rows whose bin has more than five elements.
[G](#mw%5F477c47e6-0078-4d67-8832-15ee7db221e1) = groupfilter(___,[datavars](#mw%5Fedf9e620-6bdf-4de6-8a16-0620452e9245))
specifies the table variables to apply the filtering method to for any of the previous syntaxes. For example, G = groupfilter(T,"Trial",@(x) x == max(x),"Height")
keeps the rows of T
that correspond to the maximum height for each trial.
[G](#mw%5F477c47e6-0078-4d67-8832-15ee7db221e1) = groupfilter(___,"IncludedEdge",[LR](#mw%5F5f325024-bee5-4e98-9964-7de3f63a6e02))
specifies the included bin edge as "left"
or "right"
to indicate which end of the bin interval is inclusive. You can useIncludeEdge
with any previous syntax that specifiesgroupbins
.
Array Data
[B](#mw%5Fe77837b1-4c6f-4eef-8ea9-ae82e8d29d33) = groupfilter([A](#mw%5Fa7f6dffe-13b9-4d27-add3-2e251ed386c8),[groupvars](#mw%5F82255d84-bc75-44d6-908d-e1b19d1d57c5),[method](#mw%5F092baeaa-975d-4212-a576-dac2859d8d01))
returns the rows of vector or matrix A
that satisfy the group-wise filtering condition specified in method
. The filtering conditionmethod
is a function handle applied to all column vectors. Groups are defined by rows in the column vectors in groupvars
that have the same unique combination of values.
You can use groupfilter
functionality interactively by adding the Compute by Group task to a live script.
[B](#mw%5Fe77837b1-4c6f-4eef-8ea9-ae82e8d29d33) = groupfilter([A](#mw%5Fa7f6dffe-13b9-4d27-add3-2e251ed386c8),[groupvars](#mw%5F82255d84-bc75-44d6-908d-e1b19d1d57c5),[groupbins](#mw%5Fec069d74-7335-4d26-b2f7-97bf43a2c87d),[method](#mw%5F092baeaa-975d-4212-a576-dac2859d8d01))
specifies to bin rows in groupvars
according to binning schemegroupbins
prior to grouping.
[B](#mw%5Fe77837b1-4c6f-4eef-8ea9-ae82e8d29d33) = groupfilter([A](#mw%5Fa7f6dffe-13b9-4d27-add3-2e251ed386c8),[groupvars](#mw%5F82255d84-bc75-44d6-908d-e1b19d1d57c5),[groupbins](#mw%5Fec069d74-7335-4d26-b2f7-97bf43a2c87d),[method](#mw%5F092baeaa-975d-4212-a576-dac2859d8d01),"IncludedEdge",[LR](#mw%5F5f325024-bee5-4e98-9964-7de3f63a6e02))
specifies whether to include the left or right edge in each bin whengroupbins
is specified.
[[B](#mw%5Fe77837b1-4c6f-4eef-8ea9-ae82e8d29d33),[BG](#mw%5F03f92aac-2feb-416e-8aea-18d59867e02c)] = groupfilter([A](#mw%5Fa7f6dffe-13b9-4d27-add3-2e251ed386c8),___)
also returns the unique grouping vector combinations corresponding to the rows inB
.
Examples
Create a table containing two variables.
groupID = [1 1 1 2 2 3]'; sample = [3 1 2 9 8 5]'; T = table(groupID,sample)
T=6×2 table groupID sample _______ ______
1 3
1 1
1 2
2 9
2 8
3 5
Group by ID number, and return rows corresponding to groups with more than two samples.
Gnumel = groupfilter(T,"groupID",@(x) numel(x) > 2)
Gnumel=3×2 table groupID sample _______ ______
1 3
1 1
1 2
Return rows whose group samples are between 0 and 6.
Gvals = groupfilter(T,"groupID",@(x) min(x) > 0 && max(x) < 6)
Gvals=4×2 table groupID sample _______ ______
1 3
1 1
1 2
3 5
Create a table containing two variables that represent a day number and temperature.
daynum = [1 1 1 1 2 2 2 2]'; temp = [67 65 71 55 61 79 58 78]'; T = table(daynum,temp)
T=8×2 table daynum temp ______ ____
1 67
1 65
1 71
1 55
2 61
2 79
2 58
2 78
Group by day number, and return the largest two temperatures for each day.
G = groupfilter(T,"daynum",@(x) ismember(x,maxk(x,2)))
G=4×2 table daynum temp ______ ____
1 67
1 71
2 79
2 78
Create a table of dates and corresponding profits.
timeStamps = datetime([2017 3 4; 2017 3 2; 2017 3 15; 2017 4 10; ... 2017 4 14; 2017 4 30; 2017 5 25; ... 2017 5 29; 2017 5 21]); profit = [2032 3071 1185 2587 1998 2899 3112 909 2619]'; T = table(timeStamps,profit)
T=9×2 table timeStamps profit ___________ ______
04-Mar-2017 2032
02-Mar-2017 3071
15-Mar-2017 1185
10-Apr-2017 2587
14-Apr-2017 1998
30-Apr-2017 2899
25-May-2017 3112
29-May-2017 909
21-May-2017 2619
Group the dates by month, and return rows that correspond to the maximum profit for that month.
Gmax = groupfilter(T,"timeStamps","month",@(x) x == max(x))
Gmax=3×3 table timeStamps profit month_timeStamps ___________ ______ ________________
02-Mar-2017 3071 Mar-2017
30-Apr-2017 2899 Apr-2017
25-May-2017 3112 May-2017
Return rows whose month had an average profit greater than $2300.
Gavg = groupfilter(T,"timeStamps","month",@(x) mean(x) > 2300)
Gavg=3×3 table timeStamps profit month_timeStamps ___________ ______ ________________
10-Apr-2017 2587 Apr-2017
14-Apr-2017 1998 Apr-2017
30-Apr-2017 2899 Apr-2017
Create a table T
that contains information about nine individuals.
groupID = [1 2 3 1 2 3 1 2 3]'; Height = [62 61 59 66 70 72 57 67 71]'; HealthStatus = categorical(["Poor"; "Good"; "Fair"; "Poor"; "Fair"; "Excellent"; "Poor"; "Excellent"; "Fair"]); T = table(groupID,Height,HealthStatus)
T=9×3 table groupID Height HealthStatus _______ ______ ____________
1 62 Poor
2 61 Good
3 59 Fair
1 66 Poor
2 70 Fair
3 72 Excellent
1 57 Poor
2 67 Excellent
3 71 Fair
Group by ID number, and return rows for groups that contain only members with a minimum height of 60.
G1 = groupfilter(T,"groupID",@(x) min(x) >= 60,"Height")
G1=3×3 table groupID Height HealthStatus _______ ______ ____________
2 61 Good
2 70 Fair
2 67 Excellent
Group by ID number, and return rows for groups that contain only members whose health status is Poor
.
G2 = groupfilter(T,"groupID",@(x) all(x == "Poor"),"HealthStatus")
G2=3×3 table groupID Height HealthStatus _______ ______ ____________
1 62 Poor
1 66 Poor
1 57 Poor
Create a vector of dates and a vector of corresponding profit values.
timeStamps = datetime([2017 3 4; 2017 3 2; 2017 3 15; 2017 3 10; ... 2017 3 14; 2017 3 31; 2017 3 25; ... 2017 3 29; 2017 3 21; 2017 3 18]); profit = [2032 3071 1185 2587 1998 2899 3112 909 2619 3085]';
Group by day of the week, and compute the maximum profit for each group. Display the maximum profits and their corresponding groups.
[maxDailyProfit,dayOfWeek] = groupfilter(profit,timeStamps, ... "dayname",@(x) x == max(x))
maxDailyProfit = 5×1
3071
1185
2899
3112
2619
dayOfWeek = 5×1 categorical Thursday Wednesday Friday Saturday Tuesday
Input Arguments
Input table, specified as a table or timetable.
Input array, specified as a column vector or a group of column vectors stored as a matrix.
Grouping variables or vectors, specified as one of these options:
- For array input data,
groupvars
can be either a column vector with the same number of rows asA
or a group of column vectors arranged in a matrix or a cell array. - For table or timetable input data,
groupvars
indicates which variables to use to compute groups in the data. You can specify the grouping variables with any of the options in this table.Indexing Scheme Values to Specify Examples Variable name A string scalar or character vectorA string array or cell array of character vectorsA pattern object "A" or 'A' — A variable named A["A" "B"] or {'A','B'} — Two variables named A andB"Var"+digitsPattern(1) — Variables named"Var" followed by a single digit Variable index An index number that refers to the location of a variable in the tableA vector of numbersA logical vector. Typically, this vector is the same length as the number of variables, but you can omit trailing0 (false) values. 3 — The third variable from the table[2 3] — The second and third variables from the table[false false true] — The third variable Function handle A function handle that takes a table variable as input and returns a logical scalar @isnumeric — All the variables containing numeric values Variable type A vartype subscript that selects variables of a specified type vartype("numeric") — All the variables containing numeric values
Example: groupfilter(T,"Var3",method)
Filtering method, specified as a function handle.
method
defines the function used to filter out members from each group. The function must return a logical scalar or a logical column vector with the same number of rows as the input data indicating which group members to select.
- If the function returns a logical scalar, then either all members of the group are filtered out (when the value is
false
) or none are filtered out (when the value istrue
). - If the function returns a logical vector, then members of groups are filtered out when the corresponding element is
false
, and members are kept when the corresponding element istrue
.
To define the function handle, use a syntax of the form @(inputargs) mymethod
, where mymethod
depends oninputargs
.
- A function can filter for rows corresponding to groups that meet a condition. For example,
@(x) mean(x) > 10
passes to the output only rows corresponding to groups with a group mean greater than 10. - A function can filter for rows that meet a condition within their corresponding group. For example,
@(x) x == max(x)
passes to the output only rows corresponding to the maximum value of rows within their group.
For more information, see Create Function Handle and Anonymous Functions.
When groupfilter
applies the method to more than one nongrouping variable at a time, the method returns a logical scalar or vector for each variable. For each row, the corresponding values in all returned scalars or vectors must betrue
to pass the row to the output.
Binning scheme for grouping variables or vectors, specified as one or more of the following binning methods. Grouping variables or vectors and binning scheme arguments must be the same size, or one of them can be scalar.
"none"
— No binning.- Vector of bin edges — The bin edges define the bins. You can specify the edges as numeric values or as
datetime
values fordatetime
grouping variables or vectors. - Number of bins — The number determines how many equally spaced bins to create. You can specify the number of bins as a positive integer scalar.
- Length of time (bin width) — The length of time determines the width of each bin. You can specify the bin width as a
duration
orcalendarDuration
scalar fordatetime
orduration
grouping variables or vectors. - Name of time unit (bin width) — The name of the time unit determines the width of each bin. You can specify the bin width as one of the options in this table for
datetime
orduration
grouping variables or vectors.Value Description Data Type "second" Each bin is 1 second. datetime and duration "minute" Each bin is 1 minute. datetime and duration "hour" Each bin is 1 hour. datetime and duration "day" Each bin is 1 calendar day. This value accounts for daylight saving time shifts. datetime and duration "week" Each bin is 1 calendar week. datetime only "month" Each bin is 1 calendar month. datetime only "quarter" Each bin is 1 calendar quarter. datetime only "year" Each bin is 1 calendar year. This value accounts for leap days. datetime and duration "decade" Each bin is 1 decade (10 calendar years). datetime only "century" Each bin is 1 century (100 calendar years). datetime only "secondofminute" Bins are seconds from 0 to 59. datetime only "minuteofhour" Bins are minutes from 0 to 59. datetime only "hourofday" Bins are hours from 0 to 23. datetime only "dayofweek" Bins are days from 1 to 7. The first day of the week is Sunday. datetime only "dayname" Bins are full day names, such as "Sunday". datetime only "dayofmonth" Bins are days from 1 to 31. datetime only "dayofyear" Bins are days from 1 to 366. datetime only "weekofmonth" Bins are weeks from 1 to 6. datetime only "weekofyear" Bins are weeks from 1 to 54. datetime only "monthname" Bins are full month names, such as "January". datetime only "monthofyear" Bins are months from 1 to 12. datetime only "quarterofyear" Bins are quarters from 1 to 4. datetime only
Table variables to operate on, specified as one of the options in this table.datavars
indicates which variables of the input table or timetable to apply the filtering methods to. Other variables not specified bydatavars
pass through to the output without being operated on.groupfilter
applies the filtering methods to the specified variables and uses the results to remove rows from all variables. Whendatavars
is not specified, groupfilter
operates on each nongrouping variable.
Indexing Scheme | Values to Specify | Examples |
---|---|---|
Variable name | A string scalar or character vectorA string array or cell array of character vectorsA pattern object | "A" or 'A' — A variable named A["A" "B"] or {'A','B'} — Two variables named A andB"Var"+digitsPattern(1) — Variables named"Var" followed by a single digit |
Variable index | An index number that refers to the location of a variable in the tableA vector of numbersA logical vector. Typically, this vector is the same length as the number of variables, but you can omit trailing0 (false) values. | 3 — The third variable from the table[2 3] — The second and third variables from the table[false false true] — The third variable |
Function handle | A function handle that takes a table variable as input and returns a logical scalar | @isnumeric — All the variables containing numeric values |
Variable type | A vartype subscript that selects variables of a specified type | vartype("numeric") — All the variables containing numeric values |
Example: groupfilter(T,groupvars,method,["Var1" "Var2" "Var4"])
Included bin edge for binning scheme, specified as either"left"
or "right"
, indicating which end of the bin interval is inclusive.
You can specify LR
only if you also specifygroupbins
, and the value applies to all binning methods for all grouping variables or vectors.
Output Arguments
Output table for table or timetable input data, returned as a table or timetable.G
contains the rows in T
that satisfy the group-wise filtering method.
Output array for array input data, returned as a vector or matrix.B
contains the rows in A
that satisfy the group-wise filtering method.
Grouping vectors for array input data, returned as a column vector or cell array of column vectors. BG
contains the unique grouping vector or binned grouping vector combinations that correspond to the rows in B
.
Alternative Functionality
Live Editor Task
You can use groupfilter
functionality interactively by adding theCompute by Group task to a live script.
Extended Capabilities
Thegroupfilter
function supports tall arrays with the following usage notes and limitations:
- If
A
andgroupvars
are both tall matrices, then they must have the same number of rows. - If the first input is a tall matrix, then
groupvars
can be a cell array containing tall grouping vectors. - The
groupvars
anddatavars
arguments do not support function handles. - The
method
argument must be a valid input forsplitapply
operating on a tall array. - When grouping by discretized datetime arrays, the categorical group names are different compared to in-memory
groupfilter
calculations.
For more information, see Tall Arrays.
Usage notes and limitations:
- Sparse inputs are not supported.
- Binning scheme is not supported for datetime or duration data.
- Input tables that contain multidimensional arrays are not supported.
- Computation methods must be constant.
- Grouping variables must be constant when the first input argument is a table.
- Data variables must be constant.
- Binning scheme specified as character vectors or strings must be constant.
- Name-value arguments must be constant.
- Computation methods cannot return sparse or multidimensional results.
Version History
Introduced in R2019b
Apply multiple binning methods to one grouping variable or vector by specifying a cell array of binning methods.
Generate C or C++ code for the groupfilter
function. For usage notes and limitations, see C/C++ Code Generation.
The groupfilter
function shows improved performance, especially when the data count in each group is small.
For example, this code filters by group a matrix with 500 groups with a count of 10 each. The code is about 2.32x faster than in the previous release.
function timingGroupfilter data = (1:5000)'; groups = repelem(1:length(data)/10,10)'; p = randperm(length(data)); data = data(p); groups = groups(p);
tic for k = 1:600 G = groupfilter(data,groups,@(x) x == max(x)); end toc end
The approximate execution times are:
R2021b: 2.32 s
R2022a: 1.00 s
The code was timed on a Windows® 10, Intel® Xeon® CPU E5-1650 v4 @ 3.60 GHz test system by calling thetimingGroupfilter
function.