rowfun - Apply function to table or timetable rows - MATLAB (original) (raw)

Apply function to table or timetable rows

Syntax

Description

[B](#btyj4wy-1%5Fsep%5Fshared-B) = rowfun([func](#btyj4wy-1-func),[A](#btyj4wy-1-A)) applies the function func to each row of the table or timetable A and returns the results in the table or timetable B.

The number of inputs that the function func accepts must equal the number of variables in A. For example, iffunc must be called with two input arguments, thenA must have two variables. To find the number of variables in a table, use the width function.

[B](#btyj4wy-1%5Fsep%5Fshared-B) = rowfun([func](#btyj4wy-1-func),[A](#btyj4wy-1-A),[Name,Value](#namevaluepairarguments)) specifies options using one or more name-value arguments. For example, you can use the GroupingVariables name-value argument to carry out calculations on groups of rows. For more information about calculations on groups of data, see Calculations on Groups of Data.

example

Examples

collapse all

Create a table with two variables of numeric data.

X = randi(10,[5,1]); Y = randi(10,[5,1]); A = table(X,Y)

A=5×2 table X Y __ __

Apply the plus function to each row of the table. The function call plus(X,Y) is equivalent to the operation X + Y. The plus function accepts two inputs and returns one output. To specify a function as an input argument to rowfun, use the @ symbol.

B = rowfun(@plus,A,"OutputVariableNames","Sum")

B=5×1 table Sum ___

Append the output table to the input table.

C=5×3 table X Y Sum __ __ ___

 9     1    10 
10     3    13 
 2     6     8 
10    10    20 
 7    10    17

Apply a function that returns multiple outputs to the rows of a table. The rowfun function stores each output from the applied function in a variable of the output table.

Read data from a CSV (comma-separated values) file into a table. The sample file contains test scores for 10 students from two different schools.

scores = readtable("testScores.csv","TextType","string"); scores.School = categorical(scores.School)

scores=10×5 table LastName School Test1 Test2 Test3 __________ __________ _____ _____ _____

"Jeong"       XYZ School     90       87       93  
"Collins"     XYZ School     87       85       83  
"Torres"      XYZ School     86       85       88  
"Phillips"    ABC School     75       80       72  
"Ling"        ABC School     89       86       87  
"Ramirez"     ABC School     96       92       98  
"Lee"         XYZ School     78       75       77  
"Walker"      ABC School     91       94       92  
"Garcia"      ABC School     86       83       85  
"Chang"       XYZ School     79       76       82

To find the minimum and maximum test scores across each row, apply the bounds function. The bounds function returns two output arguments. The output of rowfun is a new table that has TestMin and TestMax variables. In this case, also specify the SeparateInputs name-value argument as false so that values across each row are combined into a vector before being passed to bounds.

vars = ["Test1","Test2","Test3"]; minmaxTest = rowfun(@bounds, ... scores, ... "InputVariables",vars, ... "OutputVariableNames",["TestMin","TestMax"], ... "SeparateInputs",false)

minmaxTest=10×2 table TestMin TestMax _______ _______

  87         93   
  83         87   
  85         88   
  72         80   
  86         89   
  92         98   
  75         78   
  91         94   
  83         86   
  76         82

You can append the minimum and maximum to the input table.

scores = [scores minmaxTest]

scores=10×7 table LastName School Test1 Test2 Test3 TestMin TestMax __________ __________ _____ _____ _____ _______ _______

"Jeong"       XYZ School     90       87       93        87         93   
"Collins"     XYZ School     87       85       83        83         87   
"Torres"      XYZ School     86       85       88        85         88   
"Phillips"    ABC School     75       80       72        72         80   
"Ling"        ABC School     89       86       87        86         89   
"Ramirez"     ABC School     96       92       98        92         98   
"Lee"         XYZ School     78       75       77        75         78   
"Walker"      ABC School     91       94       92        91         94   
"Garcia"      ABC School     86       83       85        83         86   
"Chang"       XYZ School     79       76       82        76         82

Apply a function to data in groups of rows of the input table. The output table has one row for each group.

Read data from a CSV file into a table. The sample file contains test scores for 10 students from two different schools.

scores = readtable("testScores.csv","TextType","string"); scores.School = categorical(scores.School)

scores=10×5 table LastName School Test1 Test2 Test3 __________ __________ _____ _____ _____

"Jeong"       XYZ School     90       87       93  
"Collins"     XYZ School     87       85       83  
"Torres"      XYZ School     86       85       88  
"Phillips"    ABC School     75       80       72  
"Ling"        ABC School     89       86       87  
"Ramirez"     ABC School     96       92       98  
"Lee"         XYZ School     78       75       77  
"Walker"      ABC School     91       94       92  
"Garcia"      ABC School     86       83       85  
"Chang"       XYZ School     79       76       82

Calculate the mean test score for each student and add it as a new table variable. You can extract the numeric test scores and calculate the means along the second dimension. The result is a column vector that you can attach to scores as a new variable.

scores.TestMean = mean(scores{:,["Test1","Test2","Test3"]},2)

scores=10×6 table LastName School Test1 Test2 Test3 TestMean __________ __________ _____ _____ _____ ________

"Jeong"       XYZ School     90       87       93           90 
"Collins"     XYZ School     87       85       83           85 
"Torres"      XYZ School     86       85       88       86.333 
"Phillips"    ABC School     75       80       72       75.667 
"Ling"        ABC School     89       86       87       87.333 
"Ramirez"     ABC School     96       92       98       95.333 
"Lee"         XYZ School     78       75       77       76.667 
"Walker"      ABC School     91       94       92       92.333 
"Garcia"      ABC School     86       83       85       84.667 
"Chang"       XYZ School     79       76       82           79

Find the student whose mean test score is the maximum for each school. Apply the helper function, findNameAtMax, defined at the end of this example. The helper function takes multiple input arguments (last names and test scores) and returns multiple output arguments (maximum score and last name). The variable GroupCount in the output table indicates the number of rows in scores for each school.

maxScoresBySchool = rowfun(@findNameAtMax, ... scores, ... "InputVariables",["LastName","TestMean"], ... "GroupingVariables","School", ... "OutputVariableNames",["max_TestMean","LastName"])

maxScoresBySchool=2×4 table School GroupCount max_TestMean LastName __________ __________ ____________ _________

ABC School        5            95.333       "Ramirez"
XYZ School        5                90       "Jeong"

Helper Function

This code defines the findNameAtMax helper function.

function [maxValue,lastName] = findNameAtMax(names,values) % Return maximum value and the last name % from the row at which the maximum value occurs [maxValue,maxIndex] = max(values); lastName = names(maxIndex); end

To pass optional arguments when you apply a function, wrap the function call in an anonymous function.

Create a table with two variables that are integer arrays.

X = int32(randi(10,[5,1])); Y = int32(randi(10,[5,1])); A = table(X,Y)

A=5×2 table X Y __ __

Perform integer division of the two table variables by applying the idivide function.

B=5×1 table Var1 ____

The idivide function provides several options for rounding the result. The default rounding option is "fix". To use a different rounding option with idivide, wrap a call that specifies that option in an anonymous function. For example, specify "ceil" as the rounding option.

func = @(x,y) idivide(x,y,"ceil");

Perform integer division with "ceil" by applying the anonymous function.

C=5×1 table Var1 ____

Input Arguments

collapse all

Function, specified as a function handle. You can specify a handle for an existing function, define the function in a file, or specify an anonymous function. The function takes N input arguments, whereN = width(A), and must have a syntax in this form:

result = f(arg1, . . . ,argN)

To call f on the rows of A, specifyfunc as shown in this call torowfun.

func = @f; B = rowfun(func,A);

For every row in A, rowfun callsfunc on that row, and then assigns the output offunc to the corresponding row inB. The output B has one variable.

Some further considerations:

The function that func represents can have other syntaxes with additional optional arguments. But whenrowfun calls the function, it calls the syntax that has the appropriate number of input arguments.
For example, the idivide function has a syntax that specifies a third optional argument. But if you specifyfunc as @idivide, thenrowfun calls idivide using the idivide(arg1,arg2) syntax.
To call a function with optional arguments, wrap it in an anonymous function. For example, to callidivide with the "ceil" option, specify func as @(x,y) idivide(x,y,"ceil").
To return more than one output from func, use the NumOutputs orOutputVariableNames name-value arguments. In that case, the output B has multiple variables, one for each output of func.
If func returns an array with a different number of rows each time it is called, then specify theOutputFormat name-value argument as"cell". Otherwise, func must return an array with the same number of rows each time it is called.
If func corresponds to more than one function file (that is, if func represents a set of overloaded functions), MATLAB® determines which function to call based on the class of the input arguments.

Example: B = rowfun(@idivide,A) performs integer division. A is a table with two variables, with both variables belonging to an integer class. B is a table with one variable.

Example: B = rowfun(@(x,y) x.^2+y.^2,A) calculates the sum of the squares of the two variables in A.

Example: B = rowfun(@(x,y) idivide(x,y,"ceil"),A) performs integer division by applying the idivide function with the "ceil" option.

Input table, specified as a table or timetable.

Name-Value Arguments

collapse all

Specify optional pairs of arguments asName1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter.

Example: B = rowfun(func,A,InputVariables=["Var2","Var3"]) uses only the variables named Var2 and Var3 inA as the inputs to func.

Before R2021a, use commas to separate each name and value, and enclose Name in quotes.

Example: B = rowfun(func,A,"InputVariables",["Var2","Var3"]) uses only the variables named Var2 and Var3 inA as the inputs to func.

Variables of A to pass to func, specified using one of the indexing schemes from this table.

Indexing Scheme	Examples
Variable names: A string array, character vector, or cell array of character vectorsA pattern object	"A" or 'A' — A variable named A["A","B"] or {'A','B'} — Two variables named A and B"Var"+digitsPattern(1) — Variables named "Var" followed by a single digit
Variable index: An index number that refers to the location of a variable in the tableA vector of numbersA logical vector. Typically, this vector is the same length as the number of variables, but you can omit trailing 0 or false values	3 — The third variable from the table[2 3] — The second and third variables from the table[false false true] — The third variable
Function handle: A handle to a function that takes one argument as input and returns a logical scalar. The function must have a syntax in this form:tf = f(arg) If you need to apply a function that has additional optional arguments, wrap it in an anonymous function.	@isnumeric — Handle to a function that returnstrue for an input argument that contain numeric values

Example: B = rowfun(func,A,InputVariables=[1 3 4]) uses only the first, third, and fourth variables in A as the inputs to func.

Example: B = rowfun(func,A,InputVariables=@isnumeric) uses only the numeric variables in A as the inputs tofunc.

Variables of A to use as grouping variables, specified using one of the indexing schemes from this table.

Indexing Scheme	Examples
Variable names: A string array, character vector, or cell array of character vectorsA pattern object	"A" or 'A' — A variable named A["A","B"] or {'A','B'} — Two variables named A and B"Var"+digitsPattern(1) — Variables named "Var" followed by a single digit
Variable index: An index number that refers to the location of a variable in the tableA vector of numbersA logical vector. Typically, this vector is the same length as the number of variables, but you can omit trailing 0 or false values	3 — The third variable from the table[2 3] — The second and third variables from the table[false false true] — The third variable

The unique values in the grouping variables define groups. Rows inA where the grouping variables have the same values belong to the same group. rowfun appliesfunc to each group of rows, rather than separately to each row of A. The outputB contains one row for each group. For more information on calculations using grouping variables, see Calculations on Groups of Data.

Grouping variables can have any of the data types listed in the table.

Values That Specify Groups	Data Type of Grouping Variable
Numbers	Numeric or logical vector
Text	String array or cell array of character vectors
Dates and times	datetime,duration, orcalendarDuration vector
Categories	categorical vector
Bins	Vector of binned values, created by binning a continuous distribution of numeric,datetime, orduration values

Many data types have ways to represent missing values, such asNaNs, NaTs, undefinedcategorical values, or missing strings. If any grouping variable has a data type that can represent missing values, then rows where missing values occur in that grouping variable do not belong to any group and are excluded from the output.

To include rows where the grouping variables have missing values, consider using the groupsummary function instead.

Row labels can be grouping variables. You can group on row labels alone, on one or more variables in A, or on row labels and variables together.

If A is a table, then the labels are row names.
If A is a timetable, then the labels are row times.

The output B has one row for each group of rows in the input A. If B is a table or timetable, then B has:

Variables corresponding to the input table variables thatfunc was applied to
Variables corresponding to the grouping variables
A new variable, GroupCount, whose values are the number of rows of the inputA that are in each group

If B is a timetable, then B also has:

Row times, where the first row time from each group of rows in A is the corresponding row time in B. To return B as a table without row times, specifyOutputFormat as"table".

Example: B = rowfun(func,A,GroupingVariables="Var3") uses the variable named Var3 in A as a grouping variable.

Example: B = rowfun(func,A,GroupingVariables=["Var3","Var4"]) uses the variables named Var3 and Var4 inA as grouping variables.

Example: B = rowfun(func,A,GroupingVariables=[3 4]) uses the third and fourth variables in A as grouping variables.

Option to call func with separate inputs, specified as a numeric or logical 1 (true) or 0 (false).

If SeparateInputs istrue, then func expects separate inputs. rowfun callsfunc with width(A) inputs, one argument for each data variable.
If SeparateInputs isfalse, then func expects one argument containing all inputs.rowfun creates the input argument tofunc by concatenating the values in each row of A.
For example, if A is a table that has three variables, and each variable is a numeric vector, then specifying SeparateInputs asfalse causes rowfun to concatenate the three numeric vectors into one numeric matrix. The matrix has three columns. Thenrowfun passes that matrix as one input argument to func.

Example: B = rowfun(@mean,A,SeparateInputs=false) treats N table variables as though their contents were the columns of one array, so that you can treat each row ofA as a vector that is passed tomean.

Option to pass values from cell variables to func, specified as a numeric or logical 0 (false) or 1 (true).

If ExtractCellContents istrue, then rowfun extracts the contents of a variable in A whose data type is cell and passes the values, rather than the cells, tofunc.
For grouped calculations, the values within each group in a cell variable must allow vertical concatenation.
If ExtractCellContents isfalse, then rowfun passes the cells of a variable in A whose data type is cell tofunc.

Example: B = rowfun(func,A,ExtractCellContents=true) extracts cell contents from variables that are cell arrays.

Variable names for outputs of func, specified as a string array, character vector, or cell array of character vectors, with names that are nonempty and distinct. The number of names must equal the number of outputs fromfunc.

The variable names must be valid MATLAB identifiers. If valid MATLAB identifiers are not available for use as variable names, MATLAB uses a cell array of N character vectors of the form {'Var1' ... 'Var_`N`_'}, where_N_ is the number of variables. You can determine valid MATLAB variable names using the functionisvarname.

Example: B = rowfun(func,A,OutputVariableNames=["V1","V2"]) returns an output table with two variables named V1 andV2.

Number of outputs from func, specified as a nonnegative integer. The integer must be less than or equal to the possible number of outputs from func.

Example: B = rowfun(func,A,NumOutputs=2) returns two outputs fromfunc.

Format of B, specified as one of the values in this table.

"auto" (default) (since R2023a)	rowfun returns an output whose data type matches the data type of the inputA.
"table"	rowfun returns a table with one variable for each output offunc. For grouped calculations,B also contains the grouping variables and a new GroupCount variable."table" allows you to use a function that returns values of different sizes or data types. However, for ungrouped calculations, all of the outputs fromfunc must have one row each time it is called. For grouped calculations, all of the outputs from func must have the same number of rows.IfA is a table, then this format is the default output format.
"timetable"	rowfun returns a timetable with one variable for each variable inA (or each variable specified with InputVariables). For grouped calculations, B also contains the grouping variables and a newGroupCount variable.rowfun creates the row times of B from the row times of A. If the row times assigned to B do not make sense in the context of the calculations performed using func, then specifyOutputFormat as"table".IfA is a timetable, then this is the default output format.
"uniform"	rowfun concatenates the output values into a vector.func must return a scalar with the same data type each time it is called.
"cell"	rowfun returns a cell array. "cell" allows you to use a function that returns values of different sizes or data types.

Example: B = rowfun(func,A,OutputFormat="uniform") returns the output as a vector.

Function to call if func fails, specified as a function handle. If func throws an error, then the error handler function specified by ErrorHandler catches the error and takes the specified action.

The error handler function must meet these requirements:

The definition of the error handler function must specify that it returns output arguments that match the number and data types of the output arguments of func.
When called, the error handler function can either throw an error or return output arguments. But even if the error handler always throws an error, its definition must specify that it returns the same types and number of output arguments asfunc.
The error handler function cannot be an anonymous function.
Instead, write it as a local function. You can even define a local function in a script. You do not have to write the local function in a separate file.

If you do not specify ErrorHandler, thenrowfun rethrows the error that it caught fromfunc.

The first input argument of the error handler is a structure with these fields:

cause — MException object that contains information about the error (since R2024a)
index — Row or group index at which the error occurred

The remaining input arguments to the error handler are the input arguments for the call to func that madefunc throw the error.

For example, suppose that func returns two doubles as output arguments. You can specify the error handler as a function that raises a warning and returns two output arguments.

function [A,B] = errorFunc(S,varargin) warning(S.cause.identifier,S.cause.message); A = NaN; B = NaN; end

In releases before R2024a, the first input argument of the error handler is a structure with these fields:

identifier — Error identifier
message — Error message text
index — Row or group index at which the error occurred

Example: B = rowfun(func,A,ErrorHandler=@errorFunc) specifies errorFunc as the error handler.

Output Arguments

More About

collapse all

In data analysis, you commonly perform calculations on groups of data. For such calculations, you split one or more data variables into groups of data, perform a calculation on each group, and combine the results into one or more output variables. You can specify the groups using one or more_grouping variables_. The unique values in the grouping variables define the groups that the corresponding values of the data variables belong to.

For example, the diagram shows a simple grouped calculation that splits a 6-by-1 numeric vector into two groups of data, calculates the mean of each group, and then combines the outputs into a 2-by-1 numeric vector. The 6-by-1 grouping variable has two unique values, AB andXYZ.

Calculation that splits a data variable based on a grouping variable, performs calculations on individual groups of data by applying the same function, and then concatenates the outputs of those function calls

You can specify grouping variables that have numbers, text, dates and times, categories, or bins.

Extended Capabilities

Version History

Introduced in R2013b

expand all

To return an output whose data type matches the data type of the input, specify the OutputFormat name-value argument as"auto". This value is the default value.