findInterval: Find Interval Numbers or Indices (original) (raw)
findInterval | R Documentation |
---|
Find Interval Numbers or Indices
Description
Given a vector of non-decreasing breakpoints in vec
, find the interval containing each element of x
; i.e., ifi <- findInterval(x,v)
, for each index j
in x
_v[i[j]] ≤ x[j] < v[i[j] + 1]_where v[0] := - Inf,v[N+1] := + Inf, and N <- length(v)
. At the two boundaries, the returned index may differ by 1, depending on the optional arguments rightmost.closed
and all.inside
.
Usage
findInterval(x, vec, rightmost.closed = FALSE, all.inside = FALSE, left.open = FALSE)
Arguments
x | numeric. |
---|---|
vec | numeric, sorted (weakly) increasingly, of length N, say. |
rightmost.closed | logical; if true, the rightmost interval,vec[N-1] .. vec[N] is treated as closed, see below. |
all.inside | logical; if true, the returned indices are coerced into 1,...,N-1, i.e., 0 is mapped to 1and N to N-1. |
left.open | logical; if true all the intervals are open at left and closed at right; in the formulas below, ≤ should be swapped with <_ (and _> with ≥), andrightmost.closed means ‘leftmost is closed’. This may be useful, e.g., in survival analysis computations. |
Details
The function findInterval
finds the index of one vector x
in another, vec
, where the latter must be non-decreasing. Where this is trivial, equivalent to apply( outer(x, vec, ">="), 1, sum)
, as a matter of fact, the internal algorithm uses interval search ensuring O(n * log(N)) complexity wheren <- length(x)
(and N <- length(vec)
). For (almost) sorted x
, it will be even faster, basically O(n).
This is the same computation as for the empirical distribution function, and indeed, findInterval(t, sort(X))
is_identical_ to n * Fn(t; X[1],..,X[n]) where Fn is the empirical distribution function of X[1],..,X[n].
When rightmost.closed = TRUE
, the result for x[j] = vec[N]
( = max(vec)), is N - 1
as for all other values in the last interval.
left.open = TRUE
is occasionally useful, e.g., for survival data. For (anti-)symmetry reasons, it is equivalent to using “mirrored” data, i.e., the following is always true:
identical(
findInterval( x, v, left.open= TRUE, ...) ,
N - findInterval(-x, -v[N:1], left.open=FALSE, ...) )
where N <- length(vec)
as above.
Value
vector of length length(x)
with values in 0:N
(andNA
) where N <- length(vec)
, or values coerced to1:(N-1)
if and only if all.inside = TRUE
(equivalently coercing all x values inside the intervals). Note that NA
s are propagated from x
, and Inf
values are allowed in both x
and vec
.
Author(s)
Martin Maechler
See Also
approx(*, method = "constant")
which is a generalization of findInterval()
, ecdf
for computing the empirical distribution function which is (up to a factor of n) also basically the same as findInterval(.)
.
Examples
x <- 2:18 v <- c(5, 10, 15) # create two bins [5,10) and [10,15) cbind(x, findInterval(x, v))
N <- 100 X <- sort(round(stats::rt(N, df = 2), 2)) tt <- c(-100, seq(-2, 2, length.out = 201), +100) it <- findInterval(tt, X) tt[it < 1 | it >= N] # only first and last are outside range(X)
'left.open = TRUE' means "mirroring" :
N <- length(v) stopifnot(identical( findInterval( x, v, left.open=TRUE) , N - findInterval(-x, -v[N:1])))