Combining two GRangesLists (original) (raw)
Combining two GRangesLists
4
@llevar-11831
Last seen 8.6 years ago
Hello,
I'm trying to combine two GRangesList objects element-wise, to produce a single GRangesList.
gr1 <- GRanges(seqnames = "chr2", ranges = IRanges(3, 6)) gr2 <- GRanges(c("chr1", "chr1"), ranges = IRanges(c(7,13), width = 3)) grl <- GRangesList("gr1" = gr1, "gr2" = gr2)
gr3 <- GRanges(seqnames = "chr2", ranges = IRanges(9, 12)) gr4 <- GRanges(c("chr1", "chr1"), ranges = IRanges(c(25,38), width = 3)) grl2 <- GRangesList("gr1" = gr1, "gr2" = gr2)
grl3 <- GRangesList(c(gr1, gr3), c(gr2, gr4))
Based on the example above I want to combine grl and grl2 to produce grl3. My real use-case has lists of 20k elements each and I would like to be able to do this as efficiently as possible. I wrote a for loop to do this via repeated calls to c() on each GRanges element and dumping them into a new list, but this takes a long time to run on real data, and seems like there should be a better way to do this.
Thanks in advance.
Sergei.
GRanges grangeslist • 9.4k views
@michael-lawrence-3846
Last seen 3.5 years ago
United States
pc(grl, grl2)
@mike-smith
Last seen 5 weeks ago
EMBL Heidelberg
Do the GRangesLists
you're trying to combine always have the same number of elements? If so then you can probably use mapply()
e.g.
mapply(c, grl, grl2)
$gr1 GRanges object with 2 ranges and 0 metadata columns: seqnames ranges strand [1] chr2 [3, 6] * [2] chr2 [9, 12] *
seqinfo: 2 sequences from an unspecified genome; no seqlengths
$gr2 GRanges object with 4 ranges and 0 metadata columns: seqnames ranges strand [1] chr1 [ 7, 9] * [2] chr1 [13, 15] * [3] chr1 [25, 27] * [4] chr1 [38, 40] *
seqinfo: 2 sequences from an unspecified genome; no seqlengths
No idea how well it'll scale to 20,000 entries.
@llevar-11831
Last seen 8.6 years ago
Thank you Mike and Michael for your responses. Michael's solution works in sub-second time. mapply didn't return after a few minutes on real input so I stopped it.
@martin-morgan-1513
Last seen 4 months ago
United States
unlist() and split() are relatively fast on GRangesList / GRanges(), so
mergelists = function(x, y) { ux = c(unlist(x, use.names=FALSE), unlist(y, use.names=FALSE)) f = rep(c(names(x), names(y)), c(lengths(x), lengths(y))) split(ux, f) }
This differs from the pc() solution in that it respects the names of the list
mergelists(grl[2:1], grl2) GRangesList object of length 2: $gr1 GRanges object with 2 ranges and 0 metadata columns: seqnames ranges strand [1] chr2 [3, 6] * [2] chr2 [3, 6] *
$gr2 GRanges object with 4 ranges and 0 metadata columns: seqnames ranges strand [1] chr1 [ 7, 9] * [2] chr1 [13, 15] * [3] chr1 [ 7, 9] * [4] chr1 [13, 15] *
seqinfo: 2 sequences from an unspecified genome; no seqlengths
pc(grl[2:1], grl2) GRangesList object of length 2: $gr2 GRanges object with 3 ranges and 0 metadata columns: seqnames ranges strand [1] chr1 [ 7, 9] * [2] chr1 [13, 15] * [3] chr2 [ 3, 6] *
$gr1 GRanges object with 3 ranges and 0 metadata columns: seqnames ranges strand [1] chr2 [ 3, 6] * [2] chr1 [ 7, 9] * [3] chr1 [13, 15] *
seqinfo: 2 sequences from an unspecified genome; no seqlengths
Login before adding your answer.