[1] chr2 [3, 6] * [2] chr2 [9, 12] * ------- seqinfo: 2 sequences from an unspecified genome; no seqlengths $gr2 GRanges object with 4 ranges and 0 metadata columns: seqnames ranges strand [1] chr1 [ 7, 9] * [2] chr1 [13, 15] * [3] chr1 [25, 27] * [4] chr1 [38, 40] * ------- seqinfo: 2 sequences from an unspecified genome; no seqlengths No idea how well it'll scale to 20,000 entries. @llevar-11831 Last seen 8.6 years ago Thank you Mike and Michael for your responses. Michael's solution works in sub-second time. mapply didn't return after a few minutes on real input so I stopped it. @martin-morgan-1513 Last seen 4 months ago United States unlist() and split() are relatively fast on GRangesList / GRanges(), so mergelists = function(x, y) { ux = c(unlist(x, use.names=FALSE), unlist(y, use.names=FALSE)) f = rep(c(names(x), names(y)), c(lengths(x), lengths(y))) split(ux, f) } This differs from the pc() solution in that it respects the names of the list > mergelists(grl[2:1], grl2) GRangesList object of length 2: $gr1 GRanges object with 2 ranges and 0 metadata columns: seqnames ranges strand [1] chr2 [3, 6] * [2] chr2 [3, 6] * $gr2 GRanges object with 4 ranges and 0 metadata columns: seqnames ranges strand [1] chr1 [ 7, 9] * [2] chr1 [13, 15] * [3] chr1 [ 7, 9] * [4] chr1 [13, 15] * ------- seqinfo: 2 sequences from an unspecified genome; no seqlengths > pc(grl[2:1], grl2) GRangesList object of length 2: $gr2 GRanges object with 3 ranges and 0 metadata columns: seqnames ranges strand [1] chr1 [ 7, 9] * [2] chr1 [13, 15] * [3] chr2 [ 3, 6] * $gr1 GRanges object with 3 ranges and 0 metadata columns: seqnames ranges strand [1] chr2 [ 3, 6] * [2] chr1 [ 7, 9] * [3] chr1 [13, 15] * ------- seqinfo: 2 sequences from an unspecified genome; no seqlengths">

Combining two GRangesLists (original) (raw)

Combining two GRangesLists

4

@llevar-11831

Last seen 8.6 years ago

Hello,

I'm trying to combine two GRangesList objects element-wise, to produce a single GRangesList.

gr1 <- GRanges(seqnames = "chr2", ranges = IRanges(3, 6)) gr2 <- GRanges(c("chr1", "chr1"), ranges = IRanges(c(7,13), width = 3)) grl <- GRangesList("gr1" = gr1, "gr2" = gr2)

gr3 <- GRanges(seqnames = "chr2", ranges = IRanges(9, 12)) gr4 <- GRanges(c("chr1", "chr1"), ranges = IRanges(c(25,38), width = 3)) grl2 <- GRangesList("gr1" = gr1, "gr2" = gr2)

grl3 <- GRangesList(c(gr1, gr3), c(gr2, gr4))

Based on the example above I want to combine grl and grl2 to produce grl3. My real use-case has lists of 20k elements each and I would like to be able to do this as efficiently as possible. I wrote a for loop to do this via repeated calls to c() on each GRanges element and dumping them into a new list, but this takes a long time to run on real data, and seems like there should be a better way to do this.

Thanks in advance.

Sergei.

GRanges grangeslist • 9.4k views

@michael-lawrence-3846

Last seen 3.5 years ago

United States

pc(grl, grl2)

@mike-smith

Last seen 5 weeks ago

EMBL Heidelberg

Do the GRangesLists you're trying to combine always have the same number of elements? If so then you can probably use mapply() e.g.

mapply(c, grl, grl2)

$gr1 GRanges object with 2 ranges and 0 metadata columns: seqnames ranges strand [1] chr2 [3, 6] * [2] chr2 [9, 12] *

seqinfo: 2 sequences from an unspecified genome; no seqlengths

$gr2 GRanges object with 4 ranges and 0 metadata columns: seqnames ranges strand [1] chr1 [ 7, 9] * [2] chr1 [13, 15] * [3] chr1 [25, 27] * [4] chr1 [38, 40] *

seqinfo: 2 sequences from an unspecified genome; no seqlengths

No idea how well it'll scale to 20,000 entries.

@llevar-11831

Last seen 8.6 years ago

Thank you Mike and Michael for your responses. Michael's solution works in sub-second time. mapply didn't return after a few minutes on real input so I stopped it.

@martin-morgan-1513

Last seen 4 months ago

United States

unlist() and split() are relatively fast on GRangesList / GRanges(), so

mergelists = function(x, y) { ux = c(unlist(x, use.names=FALSE), unlist(y, use.names=FALSE)) f = rep(c(names(x), names(y)), c(lengths(x), lengths(y))) split(ux, f) }

This differs from the pc() solution in that it respects the names of the list

mergelists(grl[2:1], grl2) GRangesList object of length 2: $gr1 GRanges object with 2 ranges and 0 metadata columns: seqnames ranges strand [1] chr2 [3, 6] * [2] chr2 [3, 6] *

$gr2 GRanges object with 4 ranges and 0 metadata columns: seqnames ranges strand [1] chr1 [ 7, 9] * [2] chr1 [13, 15] * [3] chr1 [ 7, 9] * [4] chr1 [13, 15] *


seqinfo: 2 sequences from an unspecified genome; no seqlengths

pc(grl[2:1], grl2) GRangesList object of length 2: $gr2 GRanges object with 3 ranges and 0 metadata columns: seqnames ranges strand [1] chr1 [ 7, 9] * [2] chr1 [13, 15] * [3] chr2 [ 3, 6] *

$gr1 GRanges object with 3 ranges and 0 metadata columns: seqnames ranges strand [1] chr2 [ 3, 6] * [2] chr1 [ 7, 9] * [3] chr1 [13, 15] *


seqinfo: 2 sequences from an unspecified genome; no seqlengths

Login before adding your answer.