Finding overlapping ranges (original) (raw)
Various methods for finding/counting interval overlaps between two "range-based" objects: a query and a subject.
NOTE: This man page describes the methods that operate on IntegerRanges and IntegerRangesList derivatives. See?`findOverlaps,GenomicRanges,GenomicRanges-method`
in the GenomicRanges package for methods that operate on GenomicRanges or GRangesList objects.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
findOverlaps(query, subject, maxgap=-1L, minoverlap=0L, type=c("any", "start", "end", "within", "equal"), select=c("all", "first", "last", "arbitrary"), ...)
countOverlaps(query, subject, maxgap=-1L, minoverlap=0L, type=c("any", "start", "end", "within", "equal"), ...)
overlapsAny(query, subject, maxgap=-1L, minoverlap=0L, type=c("any", "start", "end", "within", "equal"), ...) query %over% subject query %within% subject query %outside% subject
subsetByOverlaps(x, ranges, maxgap=-1L, minoverlap=0L, type=c("any", "start", "end", "within", "equal"), invert=FALSE, ...)
overlapsRanges(query, subject, hits=NULL, ...)
poverlaps(query, subject, maxgap = 0L, minoverlap = 1L, type = c("any", "start", "end", "within", "equal"), ...)
mergeByOverlaps(query, subject, ...)
findOverlapPairs(query, subject, ...)
query, subject, x, ranges
Each of them can be an IntegerRanges (e.g. IRanges, Views) or IntegerRangesList (e.g. IRangesList, ViewsList) derivative. In addition, if subject
or ranges
is an IntegerRanges object, query
or x
can be an integer vector to be converted to length-one ranges.
If query
(or x
) is an IntegerRangesList object, then subject
(or ranges
) must also be an IntegerRangesList object.
If both arguments are list-like objects with names, each list element from the 2nd argument is paired with the list element from the 1st argument with the matching name, if any. Otherwise, list elements are paired by position. The overlap is then computed between the pairs as described below.
If subject
is omitted, query
is queried against itself. In this case, and only this case, the drop.self
and drop.redundant
arguments are allowed. By default, the result will contain hits for each range against itself, and if there is a hit from A to B, there is also a hit for B to A. Ifdrop.self
is TRUE
, all self matches are dropped. Ifdrop.redundant
is TRUE
, only one of A->B and B->A is returned.
maxgap
A single integer >= -1.
If type
is set to "any"
, maxgap
is interpreted as the maximum gap that is allowed between 2 ranges for the ranges to be considered as overlapping. The gap between 2 ranges is the number of positions that separate them. The gap between 2 adjacent ranges is 0. By convention when one range has its start or end strictly inside the other (i.e. non-disjoint ranges), the _gap_is considered to be -1.
If type
is set to anything else, maxgap
has a special meaning that depends on the particular type
. See type
below for more information.
minoverlap
A single non-negative integer.
Only ranges with a minimum of minoverlap
overlapping positions are considered to be overlapping.
When type
is "any"
, at least one of maxgap
andminoverlap
must be set to its default value.
type
By default, any overlap is accepted. By specifying the type
parameter, one can select for specific types of overlap. The types correspond to operations in Allen's Interval Algebra (see references). If type
is start
or end
, the intervals are required to have matching starts or ends, respectively. Specifying equal
as the type returns the intersection of the start
and end
matches. Iftype
is within
, the query interval must be wholly contained within the subject interval. Note that all matches must additionally satisfy the minoverlap
constraint described above.
The maxgap
parameter has special meaning with the special overlap types. For start
, end
, and equal
, it specifies the maximum difference in the starts, ends or both, respectively. For within
, it is the maximum amount by which the subject may be wider than the query. If maxgap
is set to -1 (the default), it's replaced internally by 0.
select
If query
is an IntegerRanges derivative: When select
is "all"
(the default), the results are returned as a Hits object. Otherwise the returned value is an integer vector parallel toquery
(i.e. same length) containing the first, last, or arbitrary overlapping interval in subject
, with NA
indicating intervals that did not overlap any intervals in subject
.
If query
is an IntegerRangesList derivative: When select
is "all"
(the default), the results are returned as a HitsList object. Otherwise the returned value depends on the drop
argument. When select != "all" && !drop
, an IntegerList is returned, where each element of the result corresponds to a space in query
. When select != "all" && drop
, an integer vector is returned containing indices that are offset to align with the unlisted query
.
invert
If TRUE
, keep only the ranges in x
that do _not_overlap ranges
.
hits
The Hits or HitsList object returned by findOverlaps
, or NULL
. If NULL
then hits
is computed by calling findOverlaps(query, subject, ...)
internally (the extra arguments passed to overlapsRanges
are passed tofindOverlaps
).
...
Further arguments to be passed to or from other methods:
drop
: Supported only whenquery
is an IntegerRangesList derivative.FALSE
by default. Seeselect
argument above for the details.drop.self
,drop.redundant
: Whensubject
is omitted, thedrop.self
anddrop.redundant
arguments (bothFALSE
by default) are allowed. Seequery
andsubject
arguments above for the details.
A common type of query that arises when working with intervals is finding which intervals in one set overlap those in another.
The simplest approach is to call the findOverlaps
function on a IntegerRanges or other object with range information (aka "range-based object").
For findOverlaps
: see select
argument above.
For countOverlaps
: the overlap hit count for each range in query
using the specified findOverlaps
parameters. For IntegerRangesList objects, it returns an IntegerList object.
overlapsAny
finds the ranges in query
that overlap any of the ranges in subject
. For IntegerRanges derivatives, it returns a logical vector of length equal to the number of ranges in query
. For IntegerRangesList derivatives, it returns a LogicalList object where each element of the result corresponds to a space in query
.
%over%
and %within%
are convenience wrappers for the 2 most common use cases. Currently defined as`%over%` <- function(query, subject) overlapsAny(query, subject)
and`%within%` <- function(query, subject) overlapsAny(query, subject, type="within")
. %outside%
is simply the inverse of %over%
.
subsetByOverlaps
returns the subset of x
that has an overlap hit with a range in ranges
using the specifiedfindOverlaps
parameters.
When hits
is a Hits (or HitsList) object, overlapsRanges(query, subject, hits)
returns a IntegerRanges (or IntegerRangesList) object of the same shape as hits
holding the regions of intersection between the overlapping ranges in objects query
and subject
, which should be the same query and subject used in the call to findOverlaps
that generated hits
.Same shape means same length when hits
is a Hits object, and same length and same elementNROWS when hits
is a HitsList object.
poverlaps
compares query
and subject
in parallel (like e.g., pmin
) and returns a logical vector indicating whether each pair of ranges overlaps. Integer vectors are treated as width-one ranges.
mergeByOverlaps
computes the overlap between query and subject according to the arguments in ...
. It then extracts the corresponding hits from each object and returns a DataFrame
containing one column for the query and one for the subject, as well as any mcols
that were present on either object. The query and subject columns are named by quoting and deparsing the corresponding argument.
findOverlapPairs
is like mergeByOverlaps
, except it returns a formal Pairs
object that provides useful downstream conveniences, such as finding the intersection of the overlapping ranges with pintersect
.
Allen's Interval Algebra: James F. Allen: Maintaining knowledge about temporal intervals. In: Communications of the ACM. 26/11/1983. ACM Press. S. 832-843, ISSN 0001-0782
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94
---------------------------------------------------------------------
findOverlaps()
---------------------------------------------------------------------
query <- IRanges(c(1, 4, 9), c(5, 7, 10)) subject <- IRanges(c(2, 2, 10), c(2, 3, 12))
findOverlaps(query, subject)
at most one hit per query
findOverlaps(query, subject, select="first") findOverlaps(query, subject, select="last") findOverlaps(query, subject, select="arbitrary")
including adjacent ranges in the result
findOverlaps(query, subject, maxgap=0L)
query <- IRanges(c(1, 4, 9), c(5, 7, 10)) subject <- IRanges(c(2, 2), c(5, 4))
one IRanges object with itself
findOverlaps(query)
single points as query
subject <- IRanges(c(1, 6, 13), c(4, 9, 14)) findOverlaps(c(3L, 7L, 10L), subject, select="first")
special overlap types
query <- IRanges(c(1, 5, 3, 4), width=c(2, 2, 4, 6)) subject <- IRanges(c(1, 3, 5, 6), width=c(4, 4, 5, 4))
findOverlaps(query, subject, type="start") findOverlaps(query, subject, type="start", maxgap=1L) findOverlaps(query, subject, type="end", select="first") ov <- findOverlaps(query, subject, type="within", maxgap=1L) ov
Using pairs to find intersection of overlapping ranges
hits <- findOverlaps(query, subject) p <- Pairs(query, subject, hits=hits) pintersect(p)
Shortcut
p <- findOverlapPairs(query, subject) pintersect(p)
---------------------------------------------------------------------
overlapsAny()
---------------------------------------------------------------------
overlapsAny(query, subject, type="start") overlapsAny(query, subject, type="end") query %over% subject # same as overlapsAny(query, subject) query %within% subject # same as overlapsAny(query, subject, # type="within")
---------------------------------------------------------------------
overlapsRanges()
---------------------------------------------------------------------
Extract the regions of intersection between the overlapping ranges:
overlapsRanges(query, subject, ov)
---------------------------------------------------------------------
Using IntegerRangesList objects
---------------------------------------------------------------------
query <- IRanges(c(1, 4, 9), c(5, 7, 10)) qpartition <- factor(c("a","a","b")) qlist <- split(query, qpartition)
subject <- IRanges(c(2, 2, 10), c(2, 3, 12)) spartition <- factor(c("a","a","b")) slist <- split(subject, spartition)
at most one hit per query
findOverlaps(qlist, slist, select="first") findOverlaps(qlist, slist, select="last") findOverlaps(qlist, slist, select="arbitrary")
query <- IRanges(c(1, 5, 3, 4), width=c(2, 2, 4, 6)) qpartition <- factor(c("a","a","b","b")) qlist <- split(query, qpartition)
subject <- IRanges(c(1, 3, 5, 6), width=c(4, 4, 5, 4)) spartition <- factor(c("a","a","b","b")) slist <- split(subject, spartition)
overlapsAny(qlist, slist, type="start") overlapsAny(qlist, slist, type="end") qlist
subsetByOverlaps(qlist, slist) countOverlaps(qlist, slist)