Global and unbiased detection of splice junctions from RNA-seq data - PubMed (original) (raw)
Global and unbiased detection of splice junctions from RNA-seq data
Adam Ameur et al. Genome Biol. 2010.
Abstract
We have developed a new strategy for de novo prediction of splice junctions in short-read RNA-seq data, suitable for detection of novel splicing events and chimeric transcripts. When tested on mouse RNA-seq data, >31,000 splice events were predicted, of which 88% bridged between two regions separated by <or=100 kb, and 74% connected two exons of the same RefSeq gene. Our method also reports genomic rearrangements such as insertions and deletions.
Figures
Figure 1
Overview of the split-read strategy. Each read is split into two pieces, or "anchors," of equal length (red and blue), with a gap between them. The anchors are aligned independently, and only the instances in which both align uniquely to the reference sequence are considered. Then, the alignments are extended as long as they still match the reference sequence. The SplitSeek program identifies all candidate junction reads from the split-read alignments where the boundary is located in the gap between the anchors. Then additional junction reads are detected from the set of reads that partly align to a previously detected candidate junction, and where the remaining, nonaligned, part of the read (grey lines) has a 5-bp identical sequence compared with the corresponding part of the same candidate read. SplitSeek then groups all potential junction reads, applies cut-offs, and reports the results.
Figure 2
Comparison of predictions from RNA-MATE and SplitSeek. (a) Venn diagram showing the number of predicted junctions by the two methods. (b) Predicted number of junction reads for all for all 11,395 exon boundaries reported by both RNA-MATE (x-axis) and SplitSeek (y-axis).
Figure 3
Number of predicted splice junctions (y-axis) as a function of the total number of processed reads (x-axis). The number of predicted junctions (black line) increases almost linearly with the number of reads. The green and orange lines represent two subgroups of predicted junctions: those where the two boundaries are separated by ≤100 kb, and those connecting two exon boundaries of a RefSeq gene. Predicted insertions and deletions are combined and represented by the red line.
Figure 4
SplitSeek results viewed in the UCSC genome browser. (a) Predicted splice junctions in the gene Fpgs. (b) The two grey boxes give a schematic view of how deletions and insertions are detected. The genome browser image below shows the SplitSeek results in the last exon and 3' UTR of the Nol10 gene on chromosome 12. Three events are predicted, a splice junction (to the left), a deletion (in the middle,) and an insertion (to the right). The predicted insertion and deletion are both supported by the mRNA AK148210, as indicated by the orange arrows at the bottom.
Figure 5
Two long-range SplitSeek predictions (>100 kb) that extend known gene models. (a) A predicted junction that connects an exon in the Ensembl Gene Prediction database with the second exon of the Phactr3 gene, suggesting the presence of an alternative transcription start site. (b) A putative novel exon in the Sorcs2 gene that is currently only supported by EST data.
Similar articles
- Read-Split-Run: an improved bioinformatics pipeline for identification of genome-wide non-canonical spliced regions using RNA-Seq data.
Bai Y, Kinne J, Donham B, Jiang F, Ding L, Hassler JR, Kaufman RJ. Bai Y, et al. BMC Genomics. 2016 Aug 22;17 Suppl 7(Suppl 7):503. doi: 10.1186/s12864-016-2896-7. BMC Genomics. 2016. PMID: 27556805 Free PMC article. - SpliceJumper: a classification-based approach for calling splicing junctions from RNA-seq data.
Chu C, Li X, Wu Y. Chu C, et al. BMC Bioinformatics. 2015;16 Suppl 17(Suppl 17):S10. doi: 10.1186/1471-2105-16-S17-S10. Epub 2015 Dec 7. BMC Bioinformatics. 2015. PMID: 26678515 Free PMC article. - IRcall and IRclassifier: two methods for flexible detection of intron retention events from RNA-Seq data.
Bai Y, Ji S, Wang Y. Bai Y, et al. BMC Genomics. 2015;16 Suppl 2(Suppl 2):S9. doi: 10.1186/1471-2164-16-S2-S9. Epub 2015 Jan 21. BMC Genomics. 2015. PMID: 25707295 Free PMC article. - Exonization of transposed elements: A challenge and opportunity for evolution.
Schmitz J, Brosius J. Schmitz J, et al. Biochimie. 2011 Nov;93(11):1928-34. doi: 10.1016/j.biochi.2011.07.014. Epub 2011 Jul 26. Biochimie. 2011. PMID: 21787833 Review. - [Alternative splicing--principles, functional consequences and therapeutic implications].
Heyd F. Heyd F. Dtsch Med Wochenschr. 2014 Feb;139(7):339-42. doi: 10.1055/s-0033-1349570. Epub 2013 Nov 13. Dtsch Med Wochenschr. 2014. PMID: 24226838 Review. German.
Cited by
- Comparative RNA-Seq Analysis Reveals Potentially Resistance-Related Genes in Response to Bacterial Canker of Tomato.
Pereyra-Bistraín LI, Ovando-Vázquez C, Rougon-Cardoso A, Alpuche-Solís ÁG. Pereyra-Bistraín LI, et al. Genes (Basel). 2021 Oct 29;12(11):1745. doi: 10.3390/genes12111745. Genes (Basel). 2021. PMID: 34828351 Free PMC article. - Spliceator: multi-species splice site prediction using convolutional neural networks.
Scalzitti N, Kress A, Orhand R, Weber T, Moulinier L, Jeannin-Girardon A, Collet P, Poch O, Thompson JD. Scalzitti N, et al. BMC Bioinformatics. 2021 Nov 23;22(1):561. doi: 10.1186/s12859-021-04471-3. BMC Bioinformatics. 2021. PMID: 34814826 Free PMC article. - Intron exon boundary junctions in human genome have in-built unique structural and energetic signals.
Mishra A, Siwach P, Misra P, Dhiman S, Pandey AK, Srivastava P, Jayaram B. Mishra A, et al. Nucleic Acids Res. 2021 Mar 18;49(5):2674-2683. doi: 10.1093/nar/gkab098. Nucleic Acids Res. 2021. PMID: 33621338 Free PMC article. - Transcriptomics in Toxicogenomics, Part II: Preprocessing and Differential Expression Analysis for High Quality Data.
Federico A, Serra A, Ha MK, Kohonen P, Choi JS, Liampa I, Nymark P, Sanabria N, Cattelani L, Fratello M, Kinaret PAS, Jagiello K, Puzyn T, Melagraki G, Gulumian M, Afantitis A, Sarimveis H, Yoon TH, Grafström R, Greco D. Federico A, et al. Nanomaterials (Basel). 2020 May 8;10(5):903. doi: 10.3390/nano10050903. Nanomaterials (Basel). 2020. PMID: 32397130 Free PMC article. Review. - Identifying genetic determinants of complex phenotypes from whole genome sequence data.
Long GS, Hussen M, Dench J, Aris-Brosou S. Long GS, et al. BMC Genomics. 2019 Jun 10;20(1):470. doi: 10.1186/s12864-019-5820-0. BMC Genomics. 2019. PMID: 31182025 Free PMC article.
References
- Cloonan N, Forrest AR, Kolle G, Gardiner BB, Faulkner GJ, Brown MK, Taylor DF, Steptoe AL, Wani S, Bethel G, Robertson AJ, Perkins AC, Bruce SJ, Lee CC, Ranade SS, Peckham HE, Manning JM, McKernan KJ, Grimmond SM. Stem cell transcriptome profiling via massive-scale mRNA sequencing. Nat Methods. 2008;5:613–619. doi: 10.1038/nmeth.1223. - DOI - PubMed
- Sultan M, Schulz MH, Richard H, Magen A, Klingenhoff A, Scherf M, Seifert M, Borodina T, Soldatov A, Parkhomchuk D, Schmidt D, O'Keeffe S, Haas S, Vingron M, Lehrach H, Yaspo ML. A global view of gene activity and alternative splicing by deep sequencing of the human transcriptome. Science. 2008;321:956–960. doi: 10.1126/science.1160342. - DOI - PubMed
Publication types
MeSH terms
Substances
LinkOut - more resources
Full Text Sources