Find Interval Numbers or Indices (original) (raw)
findInterval {base} | R Documentation |
---|
Description
Given a vector of non-decreasing breakpoints in vec
, find the interval containing each element of x
; i.e., ifi <- findInterval(x,v)
, for each index j
in x
v_{i_j} \le x_j < v_{i_j + 1}
where v_0 := -\infty
,v_{N+1} := +\infty
, and N <- length(v)
. At the two boundaries, the returned index may differ by 1, depending on the optional arguments rightmost.closed
and all.inside
.
Usage
findInterval(x, vec, rightmost.closed = FALSE, all.inside = FALSE,
left.open = FALSE, checkSorted = TRUE, checkNA = TRUE)
Arguments
x | numeric. |
---|---|
vec | numeric, sorted (weakly) increasingly, of length N, say. |
rightmost.closed | logical; if true, the rightmost interval,vec[N-1] .. vec[N] is treated as closed, see below. |
all.inside | logical; if true, the returned indices are coerced into 1,...,N-1, i.e., 0 is mapped to 1and N to N-1. |
left.open | logical; if true all the intervals are open at left and closed at right; in the formulas below, \le should be swapped with < (and > with \ge), andrightmost.closed means ‘leftmost is closed’. This may be useful, e.g., in survival analysis computations. |
checkSorted | logical indicating if vec should be checked, i.e., is.unsorted(vec) is asserted to be false. Setting this to FALSE skips the check gaining speed, but may return nonsense results in case vec is not sorted. |
checkNA | logical indicating if each x[i] should be checked as with is.na(.). Setting this to FALSE in case of NA's in x[] may result in platform dependent nonsense. |
Details
The function findInterval
finds the index of one vector x
in another, vec
, where the latter must be non-decreasing. Where this is trivial, equivalent to apply( outer(x, vec, `>=`), 1, sum)
, as a matter of fact, the internal algorithm uses interval search ensuring O(n \log N)
complexity wheren <- length(x)
(and N <- length(vec)
). For (almost) sorted x
, it will be even faster, basically O(n)
.
This is the same computation as for the empirical distribution function, and indeed, findInterval(t, sort(X))
is_identical_ to n F_n(t; X_1,\dots,X_n)
where F_n
is the empirical distribution function of X_1,\dots,X_n
.
When rightmost.closed = TRUE
, the result for x[j] = vec[N]
( = \max vec
), is N - 1
as for all other values in the last interval.
left.open = TRUE
is occasionally useful, e.g., for survival data. For (anti-)symmetry reasons, it is equivalent to using “mirrored” data, i.e., the following is always true:
identical(
findInterval( x, v, left.open= TRUE, ...) ,
N - findInterval(-x, -v[N:1], left.open=FALSE, ...) )
where N <- length(vec)
as above.
Value
vector of length length(x)
with values in 0:N
(andNA
) where N <- length(vec)
, or values coerced to1:(N-1)
if and only if all.inside = TRUE
(equivalently coercing all x values inside the intervals). Note that [NA](../../base/help/NA.html)
s are propagated from x
, and [Inf](../../base/help/Inf.html)
values are allowed in both x
and vec
.
Author(s)
Martin Maechler
See Also
[approx](../../stats/html/approxfun.html)(*, method = "constant")
which is a generalization of findInterval()
, [ecdf](../../stats/html/ecdf.html)
for computing the empirical distribution function which is (up to a factor of n
) also basically the same as findInterval(.)
.
Examples
x <- 2:18
v <- c(5, 10, 15) # create two bins [5,10) and [10,15)
cbind(x, findInterval(x, v))
N <- 100
X <- sort(round(stats::rt(N, df = 2), 2))
tt <- c(-100, seq(-2, 2, length.out = 201), +100)
it <- findInterval(tt, X)
tt[it < 1 | it >= N] # only first and last are outside range(X)
stopifnot(identical(it, ## suppressing the checks is faster *BUT* dangerous, unless
## you *know* that X is sorted and tt contains no NA's
findInterval(tt, X, checkSorted=FALSE, checkNA=FALSE)))
## 'left.open = TRUE' means "mirroring" :
N <- length(v)
stopifnot(identical(
findInterval( x, v, left.open=TRUE) ,
N - findInterval(-x, -v[N:1])))
[Package _base_ version 4.6.0 Index]