Apply a Function Over a Ragged Array (original) (raw)

tapply {base} R Documentation

Description

Apply a function to each cell of a ragged array, that is to each (non-empty) group of values or data rows given by a unique combination of the levels of certain factors.

Usage

tapply(X, INDEX, FUN = NULL, ..., default = NA, simplify = TRUE)

Arguments

X an R object for which a split method exists. Typically vector-like, allowing subsetting with[, or a data frame.
INDEX a list of one or more factors, each of same length as X. The elements are coerced to factors by as.factor. Can also be a formula, which is useful if X is a data frame; see the f argument insplit for interpretation.
FUN a function (or name of a function) to be applied, or NULL. In the case of functions like +, %*%, etc., the function name must be backquoted or quoted. If FUN isNULL, tapply returns a vector which can be used to subscript the multi-way array tapply normally produces.
... optional arguments to FUN: the Note section.
default (only in the case of simplification to an array) the value with which the array is initialized asarray(default, dim = ..). Before R 3.4.0, this was hard coded to array()'s default NA. If it is NA (the default), the missing value of the answer type, e.g. NA_real_, is chosen (as.raw(0) for"raw"). In a numerical case, it may be set, e.g., toFUN(integer(0)), e.g., in the case of FUN = sum to0 or 0L.
simplify logical; if FALSE, tapply always returns an array of mode "list"; in other words, a listwith a dim attribute. If TRUE (the default), then ifFUN always returns a scalar, tapply returns an array with the mode of the scalar.

Details

If FUN is not NULL, it is passed to[match.fun](../../base/help/match.fun.html), and hence it can be a function or a symbol or character string naming a function.

Value

When FUN is present, tapply calls FUN for each cell that has any data in it. If FUN returns a single atomic value for each such cell (e.g., functions mean or var) and when simplify is TRUE, tapply returns a multi-way array containing the values, and NA for the empty cells. The array has the same number of dimensions asINDEX has components; the number of levels in a dimension is the number of levels (nlevels()) in the corresponding component of INDEX. Note that if the return value has a class (e.g., an object of class "[Date](../../base/help/Date.html)") the class is discarded.

simplify = TRUE always returns an array, possibly 1-dimensional.

If FUN does not return a single atomic value, tapplyreturns an array of mode [list](../../base/help/list.html) whose components are the values of the individual calls to FUN, i.e., the result is a list with a [dim](../../base/help/dim.html) attribute.

When there is an array answer, its [dimnames](../../base/help/dimnames.html) are named by the names of INDEX and are based on the levels of the grouping factors (possibly after coercion).

For a list result, the elements corresponding to empty cells areNULL.

The [array2DF](../../base/help/array2DF.html) function can be used to convert the array returned by tapply into a data frame, which may be more convenient for further analysis.

Note

Optional arguments to FUN supplied by the ... argument are not divided into cells. It is therefore inappropriate forFUN to expect additional arguments with the same length asX.

References

Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988)The New S Language. Wadsworth & Brooks/Cole.

See Also

the convenience functions [by](../../base/help/by.html) and[aggregate](../../stats/html/aggregate.html) (using tapply);[apply](../../base/help/apply.html),[lapply](../../base/help/lapply.html) with its versions[sapply](../../base/help/sapply.html) and [mapply](../../base/help/mapply.html).

[array2DF](../../base/help/array2DF.html) to convert the result into a data frame.

Examples

require(stats)
groups <- as.factor(rbinom(32, n = 5, prob = 0.4))
tapply(groups, groups, length) #- is almost the same as
table(groups)

## contingency table from data.frame : array with named dimnames
tapply(warpbreaks$breaks, warpbreaks[,-1], sum)
tapply(warpbreaks$breaks, warpbreaks[, 3, drop = FALSE], sum)

n <- 17; fac <- factor(rep_len(1:3, n), levels = 1:5)
table(fac)
tapply(1:n, fac, sum)
tapply(1:n, fac, sum, default = 0) # maybe more desirable
tapply(1:n, fac, sum, simplify = FALSE)
tapply(1:n, fac, range)
tapply(1:n, fac, quantile)
tapply(1:n, fac, length) ## NA's
tapply(1:n, fac, length, default = 0) # == table(fac)

## example of ... argument: find quarterly means
tapply(presidents, cycle(presidents), mean, na.rm = TRUE)

ind <- list(c(1, 2, 2), c("A", "A", "B"))
table(ind)
tapply(1:3, ind) #-> the split vector
tapply(1:3, ind, sum)

## Some assertions (not held by all patch proposals):
nq <- names(quantile(1:5))
stopifnot(
  identical(tapply(1:3, ind), c(1L, 2L, 4L)),
  identical(tapply(1:3, ind, sum),
            matrix(c(1L, 2L, NA, 3L), 2, dimnames = list(c("1", "2"), c("A", "B")))),
  identical(tapply(1:n, fac, quantile)[-1],
            array(list(`2` = structure(c(2, 5.75, 9.5, 13.25, 17), names = nq),
                 `3` = structure(c(3, 6, 9, 12, 15), names = nq),
                 `4` = NULL, `5` = NULL), dim=4, dimnames=list(as.character(2:5)))))

[Package _base_ version 4.6.0 Index]