RNA-seq: impact of RNA degradation on transcript quantification - PubMed (original) (raw)

RNA-seq: impact of RNA degradation on transcript quantification

Irene Gallego Romero et al. BMC Biol. 2014.

Abstract

Background: The use of low quality RNA samples in whole-genome gene expression profiling remains controversial. It is unclear if transcript degradation in low quality RNA samples occurs uniformly, in which case the effects of degradation can be corrected via data normalization, or whether different transcripts are degraded at different rates, potentially biasing measurements of expression levels. This concern has rendered the use of low quality RNA samples in whole-genome expression profiling problematic. Yet, low quality samples (for example, samples collected in the course of fieldwork) are at times the sole means of addressing specific questions.

Results: We sought to quantify the impact of variation in RNA quality on estimates of gene expression levels based on RNA-seq data. To do so, we collected expression data from tissue samples that were allowed to decay for varying amounts of time prior to RNA extraction. The RNA samples we collected spanned the entire range of RNA Integrity Number (RIN) values (a metric commonly used to assess RNA quality). We observed widespread effects of RNA quality on measurements of gene expression levels, as well as a slight but significant loss of library complexity in more degraded samples.

Conclusions: While standard normalizations failed to account for the effects of degradation, we found that by explicitly controlling for the effects of RIN using a linear model framework we can correct for the majority of these effects. We conclude that in instances in which RIN and the effect of interest are not associated, this approach can help recover biologically meaningful signals in data from degraded RNA samples.

PubMed Disclaimer

Figures

Figure 1

Figure 1

Broad effects of RNA degradation. A) PCA plot of the 15 samples included in the study based on data from 29,156 genes with at least one mapped read in a single individual. Different colors identify different time-points, while each shape indicates a particular individual in the data set. B) Spearman correlation plot of the 15 samples in the study. PCA, principal component analysis.

Figure 2

Figure 2

Changes in library complexity over time. Dashed lines indicate median RPKM at each time-point. A) Density plots of RPKM values among all three individuals at 0 hours and 12 hours. B) as A, but 0 hours and 24 hours. C) as A, but 0 hours and 48 hours. D) as A, but 0 hours and 84 hours. RPKM, reads per kilobase transcript per million.

Figure 3

Figure 3

Log 10 median abundance of genes across all three individuals relative to 0 hours. Plots are separated by slope. A) Transcripts with significantly slow rates of degradation relative to the mean rate (identified at 1% FDR, n = 3,745). B) Transcripts that are degraded at a rate close to the mean cellular rate (n = 4,656). C) Transcripts with significantly fast rates of degradation relative to the mean rate (identified at 1% FDR, n = 3,522). In all plots, the thick dashed line indicates the median degradation rate for all genes in that group, whereas the thin dashed line denotes no change in degradation rate relative to 0 hours. FDR, false discovery rate.

Figure 4

Figure 4

Characteristics of rapidly and slowly degraded transcripts. In all plots, rapidly degraded transcripts are plotted in gold, transcripts degraded at an average rate are plotted in grey and slowly degraded transcripts are in red. A) By transcript %GC content. B) By coding region length. C) By 3′UTR length. D) By complete transcript length. E) By ENSEMBL biotype.

Figure 5

Figure 5

Spearman correlation matrices of the top 10% genes with high inter-individual variance at 0 hours. A) Before RIN correction. B) After regressing the effects of RIN. RIN, RNA integrity number.

Similar articles

Cited by

References

    1. Garneau NL, Wilusz J, Wilusz CJ. The highways and byways of mRNA decay. Nat Rev Mol Cell Biol. 2007;12:113–126. doi: 10.1038/nrm2104. - DOI - PubMed
    1. Schwanhausser B, Busse D, Li N, Dittmar G, Schuchhardt J, Wolf J, Chen W, Selbach M. Global quantification of mammalian gene expression control. Nature. 2011;12:337–342. doi: 10.1038/nature10098. - DOI - PubMed
    1. Rabani M, Levin JZ, Fan L, Adiconis X, Raychowdhury R, Garber M, Gnirke A, Nusbaum C, Hacohen N, Friedman N, Amit I, Regev A. Metabolic labeling of RNA uncovers principles of RNA production and degradation dynamics in mammalian cells. Nat Biotechnol. 2011;12:436–442. doi: 10.1038/nbt.1861. - DOI - PMC - PubMed
    1. Tani H, Mizutani R, Salam KA, Tano K, Ijiri K, Wakamatsu A, Isogai T, Suzuki Y, Akimitsu N. Genome-wide determination of RNA stability reveals hundreds of short-lived non-coding transcripts in mammals. Genome Res. 2012;12:947–956. doi: 10.1101/gr.130559.111. - DOI - PMC - PubMed
    1. Yang E, van Nimwegen E, Zavolan M, Rajewsky N, Schroeder M, Magnasco M, Darnell JE Jr. Decay rates of human mRNAs: correlation with functional characteristics and sequence attributes. Genome Res. 2003;12:1863–1872. - PMC - PubMed

Publication types

MeSH terms

Substances

LinkOut - more resources