The use of exome capture RNA-seq for highly degraded RNA with application to clinical cancer sequencing - PubMed (original) (raw)

. 2015 Sep;25(9):1372-81.

doi: 10.1101/gr.189621.115. Epub 2015 Aug 7.

Rashmi Chugh 2, Yi-Mi Wu 1, Ming Wu 3, Christine Brennan 1, Robert Lonigro 1, Fengyun Su 1, Rui Wang 1, Javed Siddiqui 1, Rohit Mehra 1, Xuhong Cao 3, David Lucas 4, Arul M Chinnaiyan 5, Dan Robinson 1

Affiliations

The use of exome capture RNA-seq for highly degraded RNA with application to clinical cancer sequencing

Marcin Cieslik et al. Genome Res. 2015 Sep.

Abstract

RNA-seq by poly(A) selection is currently the most common protocol for whole transcriptome sequencing as it provides a broad, detailed, and accurate view of the RNA landscape. Unfortunately, the utility of poly(A) libraries is greatly limited when the input RNA is degraded, which is the norm for research tissues and clinical samples, especially when specimens are formalin-fixed. To facilitate the use of RNA sequencing beyond cell lines and in the clinical setting, we developed an exome-capture transcriptome protocol with greatly improved performance on degraded RNA. Capture transcriptome libraries enable measuring absolute and differential gene expression, calling genetic variants, and detecting gene fusions. Through validation against gold-standard poly(A) and Ribo-Zero libraries from intact RNA, we show that capture RNA-seq provides accurate and unbiased estimates of RNA abundance, uniform transcript coverage, and broad dynamic range. Unlike poly(A) selection and Ribo-Zero depletion, capture libraries retain these qualities regardless of RNA quality and provide excellent data from clinical specimens including formalin-fixed paraffin-embedded (FFPE) blocks. Systematic improvements across key applications of RNA-seq are shown on a cohort of prostate cancer patients and a set of clinical FFPE samples. Further, we demonstrate the utility of capture RNA-seq libraries in a patient with a highly malignant solitary fibrous tumor (SFT) enrolled in our clinical sequencing program called MI-ONCOSEQ. Capture transcriptome profiling from FFPE revealed two oncogenic fusions: the pathognomonic NAB2-STAT6 inversion and a therapeutically actionable BRAF fusion, which may drive this specific cancer's aggressive phenotype.

© 2015 Cieslik et al.; Published by Cold Spring Harbor Laboratory Press.

PubMed Disclaimer

Figures

Figure 1.

Figure 1.

The exome-capture transcriptome protocol. (A) Flow-chart of library preparation protocols. Steps unique to each protocol are highlighted. Enrichment for mRNA occurs at the RNA or cDNA stage, respectively, for poly(A) and capture RNA-seq. (B) Controlled in vitro degradation through cell lysis and warm incubation. VCaP cells were treated with DHT or MDV3100. Intact RNA, RNA integrity number (RIN) 10, was extracted, and libraries were prepared in technical triplicates. In parallel, RNA was degraded by warm incubation for increasing amounts of time. Paired poly(A) and capture libraries were prepared from the same RNA at each degradation level.

Figure 2.

Figure 2.

Similarity of poly(A) and capture transcriptomes from intact RNA. Properties of fragments from both types of libraries. Separate bars (colors) for each replicate in A,C,D. (A) Alignment rates and library strand-specificity (% fragments aligned to the transcribed strand). (B) Types of genomic alignment regions by fraction of assigned fragments and fraction of discovered variants. (C) Efficiency of rRNA depletion (% fragments aligning to ribosomal RNA). (D) Overrepresentation of poly(A) and poly(T) hexamers. (E) Global concordance of detected genes and called variants within all exonic regions. (F) Fraction of assigned reads by biological gene category.

Figure 3.

Figure 3.

Agreement of absolute and differential gene expression. Expression levels were quantified by counting the number of aligned fragments within captured exonic regions and converted to the log2 of counts per million (log2[cpm]). Treatment log2 fold-changes were estimated through linear modeling. (A) Pairwise Q-Q plots comparing the distributions of gene expression levels. (B) Agreement of absolute levels of transcript abundance log2(cpm). (C) Agreement of differential gene expression between DHT-treated and ablated cells (MDV treatment) (log2 fold-changes). (D_–_F) Observed differences between capture and poly(A) expression estimates are not driven by GC content, gene length, or fraction of exon bases with target probes.

Figure 4.

Figure 4.

Improved performance of exome-capture transcriptomes from low quality RNA samples. (A) Correlation of absolute levels of gene expression (log2[cpm]) between a reference library from intact RNA (poly[A] level 0) and libraries from degraded RNA (level 7). (B) Impact of RNA degradation on gene expression accuracy measured as the average coefficient of variation (CV)—larger values indicate more variable measurements. (C) Impact of expression accuracy on the unsupervised clustering of samples with biological differences confounded by technical variation. (D) Sensitivity of detection of single nucleotide variants in libraries of varying RNA quality. (E) Library complexity estimated as the percentage of unique (nonduplicate) fragments among all counted fragments. (F,G) Assessments of uniformity of transcript coverage. (F) Smooth density estimate of read start positions along the scaled gene bodies (genes <10 kb were excluded). (G) Distribution of splice junctions by depth of coverage. (H) Sensitivity of detecting the TMPRSS2-ERG fusion (junction coverage).

Figure 5.

Figure 5.

Assessment of capture transcriptomes from clinical frozen and FFPE samples. (A–C) Comparative analysis of paired capture and poly(A) libraries (grouped by patient) derived from FFPE blocks and frozen tissue: (A) efficiency of rRNA depletion; (B) alignment rates; (C) fragment diversity (FD)—a compound measure of transcriptome quality sensitive to coverage, complexity, and insert size; more complex and well-covered libraries have higher FD values. (D) Within patient correlation of gene expression (log2[cpm]) by library type (poly(A) vs. capture) and source material (frozen vs. FFPE). (E,F) Sensitivity of libraries for detecting genetic changes by patient from frozen libraries: (E) number of called variants; (F) number of called candidate fusions. (G,H) Robustness of fusion detection: (G) average read support per fusion; (H) number of supporting reads for each cohort patient with the TMPRSS2-ERG fusion detected. (I,J) Paired capture and Ribo-Zero libraries from FFPE: (I) number of detected splice junctions; (J) number of called candidate fusions. (K) Selected candidate oncogenic fusion for each patient (read support).

Figure 6.

Figure 6.

Clinically relevant gene fusions from FFPE in a case of solitary fibrous tumor. (A) MRI of the spine reveals a spinal canal mass with extradural extension from T10–T12 with mass effect and compression along the spinal cord (arrowhead). Recurrent disease caused cord compression at the T12–L1 right neural foramen. (B) The tumor mass comprises sheets of highly mitotic undifferentiated cells with rich vascular stroma and extensive zones of necrosis (upper left). High-power micrograph (bottom) illustrates the cytological features of pleomorphic small round cells with ill-defined eosinophilic cytoplasm, prominent nucleoli, and numerous mitotic figures (arrow). (C) NAB2-STAT6 is the defining oncogenic fusion in SFT. The _trans_-activating domain of STAT6 is highlighted in red, the EGR1 binding domain of NAB2 in green. (D) The BBS9-BRAF fusion is likely oncogenic as it retains the kinase domain of BRAF (yellow) and has a truncation of the Ras binding domain. BRAF fusions are typically expressed at a lower level, and this rearrangement was detected with 16 reads.

Similar articles

Cited by

References

    1. Adiconis X, Borges-Rivera D, Satija R, DeLuca DS, Busby MA, Berlin AM, Sivachenko A, Thompson DA, Wysoker A, Fennell T, et al. 2013. Comparative analysis of RNA sequencing methods for degraded or low-input samples. Nat Methods 10: 623–629. - PMC - PubMed
    1. Anders S, Pyl PT, Huber W. 2015. HTSeq: a Python framework to work with high-throughput sequencing data. Bioinformatics 31: 166–169. - PMC - PubMed
    1. Armstrong SA, Staunton JE, Silverman LB, Pieters R, den Boer ML, Minden MD, Sallan SE, Lander ES, Golub TR, Korsmeyer SJ. 2002. MLL translocations specify a distinct gene expression profile that distinguishes a unique leukemia. Nat Genet 30: 41–47. - PubMed
    1. Bittner M, Meltzer P, Chen Y, Jiang Y, Seftor E, Hendrix M, Radmacher M, Simon R, Yakhini Z, Ben-Dor A, et al. 2000. Molecular classification of cutaneous malignant melanoma by gene expression profiling. Nature 406: 536–540. - PubMed
    1. Cabanski CR, Magrini V, Griffith M, Griffith OL, McGrath S, Zhang J, Walker J, Ly A, Demeter R, Fulton RS, et al. 2014. cDNA hybrid capture improves transcriptome analysis on low-input and archived samples. J Mol Diagn 16: 440–451. - PMC - PubMed

Publication types

MeSH terms

Substances

Grants and funding

LinkOut - more resources