Empirical insights into the stochasticity of small RNA sequencing - PubMed (original) (raw)

Empirical insights into the stochasticity of small RNA sequencing

Li-Xuan Qin et al. Sci Rep. 2016.

Abstract

The choice of stochasticity distribution for modeling the noise distribution is a fundamental assumption for the analysis of sequencing data and consequently is critical for the accurate assessment of biological heterogeneity and differential expression. The stochasticity of RNA sequencing has been assumed to follow Poisson distributions. We collected microRNA sequencing data and observed that its stochasticity is better approximated by gamma distributions, likely because of the stochastic nature of exponential PCR amplification. We validated our findings with two independent datasets, one for microRNA sequencing and another for RNA sequencing. Motivated by the gamma distributed stochasticity, we provided a simple method for the analysis of RNA sequencing data and showed its superiority to three existing methods for differential expression analysis using three data examples of technical replicate data and biological replicate data.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing financial interests.

Figures

Figure 1

Figure 1. Scatter plots of miRNA-specific variance versus the miRNA-specific mean number of reads on the logarithmic scale for the MXF sample (A) and the PMFH sample (B).

Panels (C,D) focus on the low-read portion of the same plots. Blue solid line is the diagonal. Red dashed line is the fitted straight line for the high-read miRNAs in each sample, with the formula of the fitted line provided in red.

Figure 2

Figure 2. Scatter plots of the miRNA-specific p-values for the Kolmogorov-Smirnov goodness-of-fit test assuming a Poisson distribution (blue points) or a gamma distribution (red points) versus the miRNA-specific logarithmic mean.

(A) MXF; (B) PMFH.

Figure 3

Figure 3

(A) Volcano plot of fold change and statistical significance for differential miRNA expression. The miRNA-specific –log10(p-value) for comparing MXF and PMFH based on a Poisson distribution assumption (blue points) or a gamma distribution assumption (red points) is plotted against the miRNA-specific logarithmic mean ratio between MXF and PMFH. (B–E) Volcano plots comparing the p-values for differential miRNA expression based on the two-sample t-test after cubic root transformation (CRT) (red points) versus the p-values based on the generalized linear model method assuming a gamma distribution (blue points) (B), edgeR (blue points) (C), DESeq (blue points) (D), and voom (blue points) (E).

Figure 4

Figure 4. Scatterplot of the −log10(p-value) for differential miRNA expression based on the two-sample t-test after cubic root transformation (CRT) versus the −log10(p-value) based on edgeR (top panels), DESeq (middle panels), and voom (bottom panels).

The left column shows data for the TCGA ovarian cancer study comparing platinum-sensitive versus platinum-resistant tumors; the right column shows data for a breast cancer study comparing invasive ductal carcinoma versus normal breast tissue. Analysis was done for high-read genes (defined as mean reads >10) for each study.

Similar articles

Cited by

References

    1. Wang Z., Gerstein M. & Snyder M. RNA-Seq: a revolutionary tool for transcriptomics. Nature reviews. Genetics 10, 57–63, doi: 10.1038/nrg2484 (2009). - DOI - PMC - PubMed
    1. Stolovitzky G. & Cecchi G. Efficiency of DNA replication in the polymerase chain reaction. Proceedings of the National Academy of Sciences of the United States of America 93, 12947–12952 (1996). - PMC - PubMed
    1. SEQC/MAQC-III Consortium. A comprehensive assessment of RNA-seq accuracy, reproducibility and information content by the Sequencing Quality Control Consortium. Nature biotechnology 32, 903–914, doi: 10.1038/nbt.2957 (2014). - DOI - PMC - PubMed
    1. Robinson M. D. & Smyth G. K. Moderated statistical tests for assessing differences in tag abundance. Bioinformatics (Oxford, England) 23, 2881–2887, doi: 10.1093/bioinformatics/btm453 (2007). - DOI - PubMed
    1. Anders S. & Huber W. Differential expression analysis for sequence count data. Genome biology 11, R106, doi: 10.1186/gb-2010-11-10-r106 (2010). - DOI - PMC - PubMed

Publication types

MeSH terms

Substances

Grants and funding

LinkOut - more resources