groupfilter - Filter by group - MATLAB (original) (raw)

Syntax

Description

Table Data

[G](#mw%5F477c47e6-0078-4d67-8832-15ee7db221e1) = groupfilter([T](#mw%5Fd9249256-8fcd-4f6e-bf96-cdabab636c36),[groupvars](#mw%5F82255d84-bc75-44d6-908d-e1b19d1d57c5),[method](#mw%5F092baeaa-975d-4212-a576-dac2859d8d01)) returns the rows of table or timetable T that satisfy the group-wise filtering condition specified in method. The filtering conditionmethod is a function handle applied to each nongrouping variable. Groups are defined by rows in the variables in groupvars that have the same unique combination of values. For example, G = groupfilter(T,"Trial",@(x) numel(x) > 5) groups the data in T byTrial, and keeps the rows that belong to groups with more than five trials.

You can use groupfilter functionality interactively by adding the Compute by Group task to a live script.

example

[G](#mw%5F477c47e6-0078-4d67-8832-15ee7db221e1) = groupfilter([T](#mw%5Fd9249256-8fcd-4f6e-bf96-cdabab636c36),[groupvars](#mw%5F82255d84-bc75-44d6-908d-e1b19d1d57c5),[groupbins](#mw%5Fec069d74-7335-4d26-b2f7-97bf43a2c87d),[method](#mw%5F092baeaa-975d-4212-a576-dac2859d8d01)) specifies to bin rows in groupvars according to binning schemegroupbins prior to grouping. For example, G = groupfilter(T,"SaleDate","year",@(x) numel(x) > 5) bins the data inSaleDate by year, and keeps the rows whose bin has more than five elements.

example

[G](#mw%5F477c47e6-0078-4d67-8832-15ee7db221e1) = groupfilter(___,[datavars](#mw%5Fedf9e620-6bdf-4de6-8a16-0620452e9245)) specifies the table variables to apply the filtering method to for any of the previous syntaxes. For example, G = groupfilter(T,"Trial",@(x) x == max(x),"Height") keeps the rows of T that correspond to the maximum height for each trial.

example

[G](#mw%5F477c47e6-0078-4d67-8832-15ee7db221e1) = groupfilter(___,"IncludedEdge",[LR](#mw%5F5f325024-bee5-4e98-9964-7de3f63a6e02)) specifies the included bin edge as "left" or "right" to indicate which end of the bin interval is inclusive. You can useIncludeEdge with any previous syntax that specifiesgroupbins.

Array Data

[B](#mw%5Fe77837b1-4c6f-4eef-8ea9-ae82e8d29d33) = groupfilter([A](#mw%5Fa7f6dffe-13b9-4d27-add3-2e251ed386c8),[groupvars](#mw%5F82255d84-bc75-44d6-908d-e1b19d1d57c5),[method](#mw%5F092baeaa-975d-4212-a576-dac2859d8d01)) returns the rows of vector or matrix A that satisfy the group-wise filtering condition specified in method. The filtering conditionmethod is a function handle applied to all column vectors. Groups are defined by rows in the column vectors in groupvars that have the same unique combination of values.

You can use groupfilter functionality interactively by adding the Compute by Group task to a live script.

[B](#mw%5Fe77837b1-4c6f-4eef-8ea9-ae82e8d29d33) = groupfilter([A](#mw%5Fa7f6dffe-13b9-4d27-add3-2e251ed386c8),[groupvars](#mw%5F82255d84-bc75-44d6-908d-e1b19d1d57c5),[groupbins](#mw%5Fec069d74-7335-4d26-b2f7-97bf43a2c87d),[method](#mw%5F092baeaa-975d-4212-a576-dac2859d8d01)) specifies to bin rows in groupvars according to binning schemegroupbins prior to grouping.

example

[B](#mw%5Fe77837b1-4c6f-4eef-8ea9-ae82e8d29d33) = groupfilter([A](#mw%5Fa7f6dffe-13b9-4d27-add3-2e251ed386c8),[groupvars](#mw%5F82255d84-bc75-44d6-908d-e1b19d1d57c5),[groupbins](#mw%5Fec069d74-7335-4d26-b2f7-97bf43a2c87d),[method](#mw%5F092baeaa-975d-4212-a576-dac2859d8d01),"IncludedEdge",[LR](#mw%5F5f325024-bee5-4e98-9964-7de3f63a6e02)) specifies whether to include the left or right edge in each bin whengroupbins is specified.

[[B](#mw%5Fe77837b1-4c6f-4eef-8ea9-ae82e8d29d33),[BG](#mw%5F03f92aac-2feb-416e-8aea-18d59867e02c)] = groupfilter([A](#mw%5Fa7f6dffe-13b9-4d27-add3-2e251ed386c8),___) also returns the unique grouping vector combinations corresponding to the rows inB.

example

Examples

collapse all

Create a table containing two variables.

groupID = [1 1 1 2 2 3]'; sample = [3 1 2 9 8 5]'; T = table(groupID,sample)

T=6×2 table groupID sample _______ ______

   1         3   
   1         1   
   1         2   
   2         9   
   2         8   
   3         5   

Group by ID number, and return rows corresponding to groups with more than two samples.

Gnumel = groupfilter(T,"groupID",@(x) numel(x) > 2)

Gnumel=3×2 table groupID sample _______ ______

   1         3   
   1         1   
   1         2   

Return rows whose group samples are between 0 and 6.

Gvals = groupfilter(T,"groupID",@(x) min(x) > 0 && max(x) < 6)

Gvals=4×2 table groupID sample _______ ______

   1         3   
   1         1   
   1         2   
   3         5   

Create a table containing two variables that represent a day number and temperature.

daynum = [1 1 1 1 2 2 2 2]'; temp = [67 65 71 55 61 79 58 78]'; T = table(daynum,temp)

T=8×2 table daynum temp ______ ____

  1        67 
  1        65 
  1        71 
  1        55 
  2        61 
  2        79 
  2        58 
  2        78 

Group by day number, and return the largest two temperatures for each day.

G = groupfilter(T,"daynum",@(x) ismember(x,maxk(x,2)))

G=4×2 table daynum temp ______ ____

  1        67 
  1        71 
  2        79 
  2        78 

Create a table of dates and corresponding profits.

timeStamps = datetime([2017 3 4; 2017 3 2; 2017 3 15; 2017 4 10; ... 2017 4 14; 2017 4 30; 2017 5 25; ... 2017 5 29; 2017 5 21]); profit = [2032 3071 1185 2587 1998 2899 3112 909 2619]'; T = table(timeStamps,profit)

T=9×2 table timeStamps profit ___________ ______

04-Mar-2017     2032 
02-Mar-2017     3071 
15-Mar-2017     1185 
10-Apr-2017     2587 
14-Apr-2017     1998 
30-Apr-2017     2899 
25-May-2017     3112 
29-May-2017      909 
21-May-2017     2619 

Group the dates by month, and return rows that correspond to the maximum profit for that month.

Gmax = groupfilter(T,"timeStamps","month",@(x) x == max(x))

Gmax=3×3 table timeStamps profit month_timeStamps ___________ ______ ________________

02-Mar-2017     3071         Mar-2017    
30-Apr-2017     2899         Apr-2017    
25-May-2017     3112         May-2017    

Return rows whose month had an average profit greater than $2300.

Gavg = groupfilter(T,"timeStamps","month",@(x) mean(x) > 2300)

Gavg=3×3 table timeStamps profit month_timeStamps ___________ ______ ________________

10-Apr-2017     2587         Apr-2017    
14-Apr-2017     1998         Apr-2017    
30-Apr-2017     2899         Apr-2017    

Create a table T that contains information about nine individuals.

groupID = [1 2 3 1 2 3 1 2 3]'; Height = [62 61 59 66 70 72 57 67 71]'; HealthStatus = categorical(["Poor"; "Good"; "Fair"; "Poor"; "Fair"; "Excellent"; "Poor"; "Excellent"; "Fair"]); T = table(groupID,Height,HealthStatus)

T=9×3 table groupID Height HealthStatus _______ ______ ____________

   1         62       Poor       
   2         61       Good       
   3         59       Fair       
   1         66       Poor       
   2         70       Fair       
   3         72       Excellent  
   1         57       Poor       
   2         67       Excellent  
   3         71       Fair       

Group by ID number, and return rows for groups that contain only members with a minimum height of 60.

G1 = groupfilter(T,"groupID",@(x) min(x) >= 60,"Height")

G1=3×3 table groupID Height HealthStatus _______ ______ ____________

   2         61       Good       
   2         70       Fair       
   2         67       Excellent  

Group by ID number, and return rows for groups that contain only members whose health status is Poor.

G2 = groupfilter(T,"groupID",@(x) all(x == "Poor"),"HealthStatus")

G2=3×3 table groupID Height HealthStatus _______ ______ ____________

   1         62          Poor    
   1         66          Poor    
   1         57          Poor    

Create a vector of dates and a vector of corresponding profit values.

timeStamps = datetime([2017 3 4; 2017 3 2; 2017 3 15; 2017 3 10; ... 2017 3 14; 2017 3 31; 2017 3 25; ... 2017 3 29; 2017 3 21; 2017 3 18]); profit = [2032 3071 1185 2587 1998 2899 3112 909 2619 3085]';

Group by day of the week, and compute the maximum profit for each group. Display the maximum profits and their corresponding groups.

[maxDailyProfit,dayOfWeek] = groupfilter(profit,timeStamps, ... "dayname",@(x) x == max(x))

maxDailyProfit = 5×1

    3071
    1185
    2899
    3112
    2619

dayOfWeek = 5×1 categorical Thursday Wednesday Friday Saturday Tuesday

Input Arguments

collapse all

Input table, specified as a table or timetable.

Input array, specified as a column vector or a group of column vectors stored as a matrix.

Grouping variables or vectors, specified as one of these options:

Example: groupfilter(T,"Var3",method)

Filtering method, specified as a function handle.

method defines the function used to filter out members from each group. The function must return a logical scalar or a logical column vector with the same number of rows as the input data indicating which group members to select.

To define the function handle, use a syntax of the form @(inputargs) mymethod, where mymethod depends oninputargs.

For more information, see Create Function Handle and Anonymous Functions.

When groupfilter applies the method to more than one nongrouping variable at a time, the method returns a logical scalar or vector for each variable. For each row, the corresponding values in all returned scalars or vectors must betrue to pass the row to the output.

Binning scheme for grouping variables or vectors, specified as one or more of the following binning methods. Grouping variables or vectors and binning scheme arguments must be the same size, or one of them can be scalar.

Table variables to operate on, specified as one of the options in this table.datavars indicates which variables of the input table or timetable to apply the filtering methods to. Other variables not specified bydatavars pass through to the output without being operated on.groupfilter applies the filtering methods to the specified variables and uses the results to remove rows from all variables. Whendatavars is not specified, groupfilter operates on each nongrouping variable.

Indexing Scheme Values to Specify Examples
Variable name A string scalar or character vectorA string array or cell array of character vectorsA pattern object "A" or 'A' — A variable named A["A" "B"] or {'A','B'} — Two variables named A andB"Var"+digitsPattern(1) — Variables named"Var" followed by a single digit
Variable index An index number that refers to the location of a variable in the tableA vector of numbersA logical vector. Typically, this vector is the same length as the number of variables, but you can omit trailing0 (false) values. 3 — The third variable from the table[2 3] — The second and third variables from the table[false false true] — The third variable
Function handle A function handle that takes a table variable as input and returns a logical scalar @isnumeric — All the variables containing numeric values
Variable type A vartype subscript that selects variables of a specified type vartype("numeric") — All the variables containing numeric values

Example: groupfilter(T,groupvars,method,["Var1" "Var2" "Var4"])

Included bin edge for binning scheme, specified as either"left" or "right", indicating which end of the bin interval is inclusive.

You can specify LR only if you also specifygroupbins, and the value applies to all binning methods for all grouping variables or vectors.

Output Arguments

collapse all

Output table for table or timetable input data, returned as a table or timetable.G contains the rows in T that satisfy the group-wise filtering method.

Output array for array input data, returned as a vector or matrix.B contains the rows in A that satisfy the group-wise filtering method.

Grouping vectors for array input data, returned as a column vector or cell array of column vectors. BG contains the unique grouping vector or binned grouping vector combinations that correspond to the rows in B.

Alternative Functionality

Live Editor Task

You can use groupfilter functionality interactively by adding theCompute by Group task to a live script.

Compute by Group task in the Live Editor

Extended Capabilities

expand all

Thegroupfilter function supports tall arrays with the following usage notes and limitations:

For more information, see Tall Arrays.

Usage notes and limitations:

Version History

Introduced in R2019b

expand all

Apply multiple binning methods to one grouping variable or vector by specifying a cell array of binning methods.

Generate C or C++ code for the groupfilter function. For usage notes and limitations, see C/C++ Code Generation.

The groupfilter function shows improved performance, especially when the data count in each group is small.

For example, this code filters by group a matrix with 500 groups with a count of 10 each. The code is about 2.32x faster than in the previous release.

function timingGroupfilter data = (1:5000)'; groups = repelem(1:length(data)/10,10)'; p = randperm(length(data)); data = data(p); groups = groups(p);

tic for k = 1:600 G = groupfilter(data,groups,@(x) x == max(x)); end toc end

The approximate execution times are:

R2021b: 2.32 s

R2022a: 1.00 s

The code was timed on a Windows® 10, Intel® Xeon® CPU E5-1650 v4 @ 3.60 GHz test system by calling thetimingGroupfilter function.