Comprehensive evaluation of differential gene expression analysis methods for RNA-seq data - PubMed (original) (raw)

Comprehensive evaluation of differential gene expression analysis methods for RNA-seq data

Franck Rapaport et al. Genome Biol. 2013.

Erratum in

Abstract

A large number of computational methods have been developed for analyzing differential gene expression in RNA-seq data. We describe a comprehensive evaluation of common methods using the SEQC benchmark dataset and ENCODE data. We consider a number of key features, including normalization, accuracy of differential expression detection and differential expression analysis when one condition has no detectable expression. We find significant differences among the methods, but note that array-based methods adapted to RNA-seq data perform comparably to methods designed for RNA-seq. Our results demonstrate that increasing the number of replicate samples significantly improves detection power over increased sequencing depth.

PubMed Disclaimer

Figures

Figure 1

Figure 1

RMSD correlation between qRT-PCR and RNA-seq log2 expression changes computed by each method. Overall, there is good concordance between log2 values derived from the DE methods and the experimental values derived from qRT-PCR measures. Upper quartile normalization implemented in baySeq package is least correlated with qRT-PCR values. DE, differential expression; RMSD, root-mean-square deviation.

Figure 2

Figure 2

Differential expression analysis using qRT-PCR validated gene set. (a) ROC analysis was performed using a qRT-PCR log2 expression change threshold of 0.5. The results show a slight advantage for DESeq and edgeR in detection accuracy. (b) At increasing log2 expression ratios (incremented by 0.1), representing a more stringent cutoff for differential expression, the performances of the Cuffdiff and limma methods gradually reduce whereas PoissonSeq performance increases. AUC, area under the curve.

Figure 3

Figure 3

P value distributions by gene read count quantiles from null model evaluations. Null model comparison where differential expression (DE) is evaluated between samples from the same condition is expected to generate a uniform distribution of P values. Indeed, the P value density plots, stratified by read count quartiles, have a uniform distribution. However, at the common significance range of ≤ 0.05 there is a noticeable increase in P value densities in Cuffdiff results indicating larger than expected false DE genes. The smoothing bandwidth was fixed at 0.0065 for all density plots and 25% was the lowest gene read count quartile.

Figure 4

Figure 4

Comparison of signal-to-noise ratio and differential expression (DE) for genes expressed in only one condition. (a) The correlation between signal-to-noise and -log10(P) was used to evaluate the accuracy of DE among genes expressed in one condition. A total of 10,272 genes was exclusively expressed in only one of the contrasting conditions in the DE analysis between the three ENCODE datasets. Gray shaded points indicate genes with adjusted P value ≥ 0.05, which are typically considered not differentially expressed. The results show that Cuffdiff, edgeR and DESeq do not properly account for variance in measurements as indicated by poor agreement with the isotonic regression line. (b) ROC curves for detection of DE at signal-to-noise ratio of ≥3. AUC: area under curve.

Figure 5

Figure 5

False positive rates and sensitivity of differential expression (DE) with sequencing depth and number of replicate samples. Differentially expressed genes in GM12892 vs MCF-7 cell lines were divided into four count quartiles and false positive rate and sensitivity were measured by decreasing sequence counts and changing the number of replicate samples. Points and bars are average and standard deviation, respectively, from five random samples of reads from each library; see Materials and methods for details. (a) Number of false positives defined as the number of DE detected genes in GM12892 vs MCF-7 that were only identified by the specific method. (b) Sensitivity rates defined as the fraction of true set genes. Note that PoissonSeq's maximum sensitivity is below 1 since it was not included in the definition of the true set. See Figures S9 to S15 in Additional file 1 for similar plots for DE between other cell lines and technical replicates. DE, differential expression; FP, false positive.

Comment in

Similar articles

Cited by

References

    1. Mortazavi A, Williams BA, McCue K, Schaeffer L, Wold B. Mapping and quantifying mammalian transcriptomes by RNA-Seq. Nat Methods. 2008;14:621–8. doi: 10.1038/nmeth.1226. - DOI - PubMed
    1. Berger MF, Levin JZ, Vijayendran K, Sivachenko A, Adiconis X, Maguire J, Johnson LA, Robinson J, Verhaak RG, Sougnez C, Onofrio RC, Ziaugra L, Cibulskis K, Laine E, Barretina J, Winckler W, Fisher DE, Getz G, Meyerson M, Jaffe DB, Gabriel SB, Lander ES, Dummer R, Gnirke A, Nusbaum C, Garraway LA. Integrative analysis of the melanoma transcriptome. Genome Res. 2010;14:413–27. doi: 10.1101/gr.103697.109. - DOI - PMC - PubMed
    1. Wang Z, Gerstein M, Snyder M. RNA-Seq: a revolutionary tool for transcriptomics. Nat Rev Genet. 2009;14:57–63. doi: 10.1038/nrg2484. - DOI - PMC - PubMed
    1. Young MD, McCarthy DJ, Wakefield MJ, Smyth GK, Oshlack A, Robinson MD. In: Bioinformatics for High Throughput Sequencing. Rodríguez-Ezpeleta N, Hackenberg M, Aransay AM, editor. New York: Springer; 2012. Differential expression for RNA sequencing (RNA-Seq) data: mapping, summarization, statistical analysis, and experimental design. pp. 169–90.
    1. Trapnell C, Hendrickson DG, Sauvageau M, Goff L, Rinn JL, Pachter L. Differential analysis of gene regulation at transcript resolution with RNA-seq. Nat Biotechnol. 2013;14:46–53. - PMC - PubMed

Publication types

MeSH terms

Substances

Grants and funding

LinkOut - more resources