Find Interval Numbers or Indices (original) (raw)

findInterval {base}	R Documentation

Description

Given a vector of non-decreasing breakpoints in vec, find the interval containing each element of x; i.e., ifi <- findInterval(x,v), for each index j in x v_{i_j} \le x_j < v_{i_j + 1}where v_0 := -\infty,v_{N+1} := +\infty, and N <- length(v). At the two boundaries, the returned index may differ by 1, depending on the optional arguments rightmost.closed and all.inside.

Usage

findInterval(x, vec, rightmost.closed = FALSE, all.inside = FALSE,
             left.open = FALSE, checkSorted = TRUE, checkNA = TRUE)

Arguments

x	numeric.
vec	numeric, sorted (weakly) increasingly, of length N, say.
rightmost.closed	logical; if true, the rightmost interval,vec[N-1] .. vec[N] is treated as closed, see below.
all.inside	logical; if true, the returned indices are coerced into 1,...,N-1, i.e., 0 is mapped to 1and N to N-1.
left.open	logical; if true all the intervals are open at left and closed at right; in the formulas below, \le should be swapped with < (and > with \ge), andrightmost.closed means ‘leftmost is closed’. This may be useful, e.g., in survival analysis computations.
checkSorted	logical indicating if vec should be checked, i.e., is.unsorted(vec) is asserted to be false. Setting this to FALSE skips the check gaining speed, but may return nonsense results in case vec is not sorted.
checkNA	logical indicating if each x[i] should be checked as with is.na(.). Setting this to FALSE in case of NA's in x[] may result in platform dependent nonsense.

Details

The function findInterval finds the index of one vector x in another, vec, where the latter must be non-decreasing. Where this is trivial, equivalent to apply( outer(x, vec, `>=`), 1, sum), as a matter of fact, the internal algorithm uses interval search ensuring O(n \log N) complexity wheren <- length(x) (and N <- length(vec)). For (almost) sorted x, it will be even faster, basically O(n).

This is the same computation as for the empirical distribution function, and indeed, findInterval(t, sort(X)) is_identical_ to n F_n(t; X_1,\dots,X_n) where F_n is the empirical distribution function of X_1,\dots,X_n.

When rightmost.closed = TRUE, the result for x[j] = vec[N]( = \max vec), is N - 1 as for all other values in the last interval.

left.open = TRUE is occasionally useful, e.g., for survival data. For (anti-)symmetry reasons, it is equivalent to using “mirrored” data, i.e., the following is always true:

identical(
      findInterval( x,  v,      left.open= TRUE, ...) ,
  N - findInterval(-x, -v[N:1], left.open=FALSE, ...) )

where N <- length(vec) as above.

Value

vector of length length(x) with values in 0:N (andNA) where N <- length(vec), or values coerced to1:(N-1) if and only if all.inside = TRUE (equivalently coercing all x values inside the intervals). Note that [NA](../../base/help/NA.html)s are propagated from x, and [Inf](../../base/help/Inf.html) values are allowed in both x and vec.

Author(s)

Martin Maechler

Examples

x <- 2:18
v <- c(5, 10, 15) # create two bins [5,10) and [10,15)
cbind(x, findInterval(x, v))

N <- 100
X <- sort(round(stats::rt(N, df = 2), 2))
tt <- c(-100, seq(-2, 2, length.out = 201), +100)
it <- findInterval(tt, X)
tt[it < 1 | it >= N] # only first and last are outside range(X)
stopifnot(identical(it, ## suppressing the checks is faster *BUT* dangerous, unless
                    ##     you *know* that X is sorted   and   tt contains no NA's
                    findInterval(tt, X, checkSorted=FALSE, checkNA=FALSE)))

##  'left.open = TRUE' means  "mirroring" :
N <- length(v)
stopifnot(identical(
                  findInterval( x,  v,  left.open=TRUE) ,
              N - findInterval(-x, -v[N:1])))

[Package _base_ version 4.6.0 Index]