A novel bioinformatics pipeline for identification and characterization of fusion transcripts in breast cancer and normal cell lines - PubMed (original) (raw)
doi: 10.1093/nar/gkr362. Epub 2011 May 27.
Asif Hossain, Brian M Necela, Sumit Middha, Krishna R Kalari, Zhifu Sun, High-Seng Chai, David W Williamson, Derek Radisky, Gary P Schroth, Jean-Pierre A Kocher, Edith A Perez, E Aubrey Thompson
Affiliations
- PMID: 21622959
- PMCID: PMC3159479
- DOI: 10.1093/nar/gkr362
A novel bioinformatics pipeline for identification and characterization of fusion transcripts in breast cancer and normal cell lines
Yan W Asmann et al. Nucleic Acids Res. 2011 Aug.
Abstract
SnowShoes-FTD, developed for fusion transcript detection in paired-end mRNA-Seq data, employs multiple steps of false positive filtering to nominate fusion transcripts with near 100% confidence. Unique features include: (i) identification of multiple fusion isoforms from two gene partners; (ii) prediction of genomic rearrangements; (iii) identification of exon fusion boundaries; (iv) generation of a 5'-3' fusion spanning sequence for PCR validation; and (v) prediction of the protein sequences, including frame shift and amino acid insertions. We applied SnowShoes-FTD to identify 50 fusion candidates in 22 breast cancer and 9 non-transformed cell lines. Five additional fusion candidates with two isoforms were confirmed. In all, 30 of 55 fusion candidates had in-frame protein products. No fusion transcripts were detected in non-transformed cells. Consideration of the possible functions of a subset of predicted fusion proteins suggests several potentially important functions in transformation, including a possible new mechanism for overexpression of ERBB2 in a HER-positive cell line. The source code of SnowShoes-FTD is provided in two formats: one configured to run on the Sun Grid Engine for parallelization, and the other formatted to run on a single LINUX node. Executables in PERL are available for download from our web site: http://mayoresearch.mayo.edu/mayo/research/biostat/stand-alone-packages.cfm.
Figures
Figure 1.
The work flow of the fusion detection algorithm implemented in SnowShoes-FTD.
Figure 2.
PCR validation of candidate fusion products. The PCR primers were designed using the template sequences generated by SnowShoes-FTD. The double-stranded cDNA libraries were constructed using total RNAs from each of the cell lines. The primer sequences and the expected PCR product sizes for each of the fusion candidates were detailed in
Supplementary Data S4
. (a) The PCR products from 50 fusion candidates with unique isoforms. The fusion candidates were grouped by the cell lines in which the fusion candidates were discovered. (b) The PCR products from five fusion candidates with two fusion isoforms each. Note that there are multiple PCR bands in the lanes for CDK12-TMEM104, and the lowest bands were those from the fusion product.
Figure 3.
The identification of in-frame fusion transcripts and their predicted protein sequences. (a) Staring from the fusion junction spanning reads that aligned to both fusion partner genes, the two junction boundary exons from fusion partner genes A and B are identified; (b) obtaining the IDs and sequences of all exons belonging to the two fusion partner genes A and B based on the curated refFlat file. In this example, Gene A has 7 exons with the third exon as the fusion boundary exon, and gene B has 10 exons with the sixth exon as the fusion boundary exon; (c) obtaining all known transcripts for the two fusion partner genes. Gene A has two known transcripts (A1 and A2) both of which contain the fusion boundary exon. Gene B has 4 known transcripts (B1 → B4) and three of which (B1, B3 and B4) contain the fusion boundary exon. (d) Generating the list of exhaustive fusion transcripts using the known transcripts containing the fusion boundary exons. There are six possible fusion transcripts: A1-B1, A1-B3, A1-B4, A2-B1, A2-B3 and A2-B4. Note that because the differences between the transcripts B1 and B4 are ‘fused out’, the fusion transcript of A1-B1 is identical to that of A1-B4. Similarly, A2-B1 is identical to A2-B4. The fusion transcripts that cause frame shift in gene B are defined as ‘out of frame’, and the ones that did not cause any frame shift will be defined as ‘in frame’ fusions. Each of the in-frame fusions will be translated into amino acid sequences of the fusion proteins.
Figure 4.
Detailed description of ARID1A_MAST2 (a) and WIPF2_ERBB2 (b) fusion transcripts. Using the process described in Figure 3, SnowShoes-FTD uses the RNA sequence of all known transcripts of the fusion partners to predict the sequence of all potential in-frame and out of frame fusion transcripts. Abundance of individual exons for each of the fusion partners, normalized to total exon abundance, was extracted from the mRNA-Seq data.
Similar articles
- Detection of redundant fusion transcripts as biomarkers or disease-specific therapeutic targets in breast cancer.
Asmann YW, Necela BM, Kalari KR, Hossain A, Baker TR, Carr JM, Davis C, Getz JE, Hostetter G, Li X, McLaughlin SA, Radisky DC, Schroth GP, Cunliffe HE, Perez EA, Thompson EA. Asmann YW, et al. Cancer Res. 2012 Apr 15;72(8):1921-8. doi: 10.1158/0008-5472.CAN-11-3142. Epub 2012 Apr 10. Cancer Res. 2012. PMID: 22496456 - The eSNV-detect: a computational system to identify expressed single nucleotide variants from transcriptome sequencing data.
Tang X, Baheti S, Shameer K, Thompson KJ, Wills Q, Niu N, Holcomb IN, Boutet SC, Ramakrishnan R, Kachergus JM, Kocher JP, Weinshilboum RM, Wang L, Thompson EA, Kalari KR. Tang X, et al. Nucleic Acids Res. 2014 Dec 16;42(22):e172. doi: 10.1093/nar/gku1005. Epub 2014 Oct 28. Nucleic Acids Res. 2014. PMID: 25352556 Free PMC article. - Identification of differentially expressed genes and typical fusion genes associated with three subtypes of breast cancer.
Wang R, Li J, Yin C, Zhao D, Yin L. Wang R, et al. Breast Cancer. 2019 May;26(3):305-316. doi: 10.1007/s12282-018-0924-y. Epub 2018 Nov 16. Breast Cancer. 2019. PMID: 30446971 - SOAPfuse: an algorithm for identifying fusion transcripts from paired-end RNA-Seq data.
Jia W, Qiu K, He M, Song P, Zhou Q, Zhou F, Yu Y, Zhu D, Nickerson ML, Wan S, Liao X, Zhu X, Peng S, Li Y, Wang J, Guo G. Jia W, et al. Genome Biol. 2013 Feb 14;14(2):R12. doi: 10.1186/gb-2013-14-2-r12. Genome Biol. 2013. PMID: 23409703 Free PMC article. - FusionQ: a novel approach for gene fusion detection and quantification from paired-end RNA-Seq.
Liu C, Ma J, Chang CJ, Zhou X. Liu C, et al. BMC Bioinformatics. 2013 Jun 15;14:193. doi: 10.1186/1471-2105-14-193. BMC Bioinformatics. 2013. PMID: 23768108 Free PMC article.
Cited by
- Fusion transcriptome profiling provides insights into alveolar rhabdomyosarcoma.
Xie Z, Babiceanu M, Kumar S, Jia Y, Qin F, Barr FG, Li H. Xie Z, et al. Proc Natl Acad Sci U S A. 2016 Nov 15;113(46):13126-13131. doi: 10.1073/pnas.1612734113. Epub 2016 Oct 31. Proc Natl Acad Sci U S A. 2016. PMID: 27799565 Free PMC article. - Folate receptor-α (FOLR1) expression and function in triple negative tumors.
Necela BM, Crozier JA, Andorfer CA, Lewis-Tuffin L, Kachergus JM, Geiger XJ, Kalari KR, Serie DJ, Sun Z, Moreno-Aspitia A, O'Shannessy DJ, Maltzman JD, McCullough AE, Pockaj BA, Cunliffe HE, Ballman KV, Thompson EA, Perez EA. Necela BM, et al. PLoS One. 2015 Mar 27;10(3):e0122209. doi: 10.1371/journal.pone.0122209. eCollection 2015. PLoS One. 2015. PMID: 25816016 Free PMC article. - Complex rearrangements and oncogene amplifications revealed by long-read DNA and RNA sequencing of a breast cancer cell line.
Nattestad M, Goodwin S, Ng K, Baslan T, Sedlazeck FJ, Rescheneder P, Garvin T, Fang H, Gurtowski J, Hutton E, Tseng E, Chin CS, Beck T, Sundaravadanam Y, Kramer M, Antoniou E, McPherson JD, Hicks J, McCombie WR, Schatz MC. Nattestad M, et al. Genome Res. 2018 Aug;28(8):1126-1135. doi: 10.1101/gr.231100.117. Epub 2018 Jun 28. Genome Res. 2018. PMID: 29954844 Free PMC article. - Interpreting functional effects of coding variants: challenges in proteome-scale prediction, annotation and assessment.
Shameer K, Tripathi LP, Kalari KR, Dudley JT, Sowdhamini R. Shameer K, et al. Brief Bioinform. 2016 Sep;17(5):841-62. doi: 10.1093/bib/bbv084. Epub 2015 Oct 22. Brief Bioinform. 2016. PMID: 26494363 Free PMC article. Review. - An integrated genomic analysis of Tudor domain-containing proteins identifies PHD finger protein 20-like 1 (PHF20L1) as a candidate oncogene in breast cancer.
Jiang Y, Liu L, Shan W, Yang ZQ. Jiang Y, et al. Mol Oncol. 2016 Feb;10(2):292-302. doi: 10.1016/j.molonc.2015.10.013. Epub 2015 Oct 28. Mol Oncol. 2016. PMID: 26588862 Free PMC article.
References
- Tomlins SA, Rhodes DR, Perner S, Dhanasekaran SM, Mehra R, Sun XW, Varambally S, Cao X, Tchinda J, Kuefer R, et al. Recurrent fusion of TMPRSS2 and ETS transcription factor genes in prostate cancer. Science. 2005;310:644–648. - PubMed
- Soda M, Choi YL, Enomoto M, Takada S, Yamashita Y, Ishikawa S, Fujiwara S, Watanabe H, Kurashina K, Hatanaka H, et al. Identification of the transforming EML4-ALK fusion gene in non-small-cell lung cancer. Nature. 2007;448:561–566. - PubMed
Publication types
MeSH terms
Substances
LinkOut - more resources
Full Text Sources
Other Literature Sources
Medical
Research Materials
Miscellaneous