Global and unbiased detection of splice junctions from RNA-seq data - PubMed (original) (raw)

Global and unbiased detection of splice junctions from RNA-seq data

Adam Ameur et al. Genome Biol. 2010.

Abstract

We have developed a new strategy for de novo prediction of splice junctions in short-read RNA-seq data, suitable for detection of novel splicing events and chimeric transcripts. When tested on mouse RNA-seq data, >31,000 splice events were predicted, of which 88% bridged between two regions separated by <or=100 kb, and 74% connected two exons of the same RefSeq gene. Our method also reports genomic rearrangements such as insertions and deletions.

PubMed Disclaimer

Figures

Figure 1

Figure 1

Overview of the split-read strategy. Each read is split into two pieces, or "anchors," of equal length (red and blue), with a gap between them. The anchors are aligned independently, and only the instances in which both align uniquely to the reference sequence are considered. Then, the alignments are extended as long as they still match the reference sequence. The SplitSeek program identifies all candidate junction reads from the split-read alignments where the boundary is located in the gap between the anchors. Then additional junction reads are detected from the set of reads that partly align to a previously detected candidate junction, and where the remaining, nonaligned, part of the read (grey lines) has a 5-bp identical sequence compared with the corresponding part of the same candidate read. SplitSeek then groups all potential junction reads, applies cut-offs, and reports the results.

Figure 2

Figure 2

Comparison of predictions from RNA-MATE and SplitSeek. (a) Venn diagram showing the number of predicted junctions by the two methods. (b) Predicted number of junction reads for all for all 11,395 exon boundaries reported by both RNA-MATE (x-axis) and SplitSeek (y-axis).

Figure 3

Figure 3

Number of predicted splice junctions (y-axis) as a function of the total number of processed reads (x-axis). The number of predicted junctions (black line) increases almost linearly with the number of reads. The green and orange lines represent two subgroups of predicted junctions: those where the two boundaries are separated by ≤100 kb, and those connecting two exon boundaries of a RefSeq gene. Predicted insertions and deletions are combined and represented by the red line.

Figure 4

Figure 4

SplitSeek results viewed in the UCSC genome browser. (a) Predicted splice junctions in the gene Fpgs. (b) The two grey boxes give a schematic view of how deletions and insertions are detected. The genome browser image below shows the SplitSeek results in the last exon and 3' UTR of the Nol10 gene on chromosome 12. Three events are predicted, a splice junction (to the left), a deletion (in the middle,) and an insertion (to the right). The predicted insertion and deletion are both supported by the mRNA AK148210, as indicated by the orange arrows at the bottom.

Figure 5

Figure 5

Two long-range SplitSeek predictions (>100 kb) that extend known gene models. (a) A predicted junction that connects an exon in the Ensembl Gene Prediction database with the second exon of the Phactr3 gene, suggesting the presence of an alternative transcription start site. (b) A putative novel exon in the Sorcs2 gene that is currently only supported by EST data.

Similar articles

Cited by

References

    1. Cloonan N, Forrest AR, Kolle G, Gardiner BB, Faulkner GJ, Brown MK, Taylor DF, Steptoe AL, Wani S, Bethel G, Robertson AJ, Perkins AC, Bruce SJ, Lee CC, Ranade SS, Peckham HE, Manning JM, McKernan KJ, Grimmond SM. Stem cell transcriptome profiling via massive-scale mRNA sequencing. Nat Methods. 2008;5:613–619. doi: 10.1038/nmeth.1223. - DOI - PubMed
    1. Pan Q, Shai O, Lee LJ, Frey BJ, Blencowe BJ. Deep surveying of alternative splicing complexity in the human transcriptome by high-throughput sequencing. Nat Genet. 2008;40:1413–1415. doi: 10.1038/ng.259. - DOI - PubMed
    1. Sultan M, Schulz MH, Richard H, Magen A, Klingenhoff A, Scherf M, Seifert M, Borodina T, Soldatov A, Parkhomchuk D, Schmidt D, O'Keeffe S, Haas S, Vingron M, Lehrach H, Yaspo ML. A global view of gene activity and alternative splicing by deep sequencing of the human transcriptome. Science. 2008;321:956–960. doi: 10.1126/science.1160342. - DOI - PubMed
    1. Wang ET, Sandberg R, Luo S, Khrebtukova I, Zhang L, Mayr C, Kingsmore SF, Schroth GP, Burge CB. Alternative isoform regulation in human tissue transcriptomes. Nature. 2008;456:470–476. doi: 10.1038/nature07509. - DOI - PMC - PubMed
    1. Cloonan N, Xu Q, Faulkner GJ, Taylor DF, Tang DT, Kolle G, Grimmond SM. RNA-MATE: A recursive mapping strategy for high-throughput RNA-sequencing data. Bioinformatics. 2009;25:2615–2616. doi: 10.1093/bioinformatics/btp459. - DOI - PMC - PubMed

Publication types

MeSH terms

Substances

LinkOut - more resources