Comprehensive evaluation of differential gene expression analysis methods for RNA-seq data - PubMed (original) (raw)
Comprehensive evaluation of differential gene expression analysis methods for RNA-seq data
Franck Rapaport et al. Genome Biol. 2013.
Erratum in
- Erratum to: Comprehensive evaluation of differential gene expression analysis methods for RNA-seq data.
Rapaport F, Khanin R, Liang Y, Pirun M, Krek A, Zumbo P, Mason CE, Socci ND, Betel D. Rapaport F, et al. Genome Biol. 2015 Nov 23;16:261. doi: 10.1186/s13059-015-0813-z. Genome Biol. 2015. PMID: 26597945 Free PMC article. No abstract available.
Abstract
A large number of computational methods have been developed for analyzing differential gene expression in RNA-seq data. We describe a comprehensive evaluation of common methods using the SEQC benchmark dataset and ENCODE data. We consider a number of key features, including normalization, accuracy of differential expression detection and differential expression analysis when one condition has no detectable expression. We find significant differences among the methods, but note that array-based methods adapted to RNA-seq data perform comparably to methods designed for RNA-seq. Our results demonstrate that increasing the number of replicate samples significantly improves detection power over increased sequencing depth.
Figures
Figure 1
RMSD correlation between qRT-PCR and RNA-seq log2 expression changes computed by each method. Overall, there is good concordance between log2 values derived from the DE methods and the experimental values derived from qRT-PCR measures. Upper quartile normalization implemented in baySeq package is least correlated with qRT-PCR values. DE, differential expression; RMSD, root-mean-square deviation.
Figure 2
Differential expression analysis using qRT-PCR validated gene set. (a) ROC analysis was performed using a qRT-PCR log2 expression change threshold of 0.5. The results show a slight advantage for DESeq and edgeR in detection accuracy. (b) At increasing log2 expression ratios (incremented by 0.1), representing a more stringent cutoff for differential expression, the performances of the Cuffdiff and limma methods gradually reduce whereas PoissonSeq performance increases. AUC, area under the curve.
Figure 3
P value distributions by gene read count quantiles from null model evaluations. Null model comparison where differential expression (DE) is evaluated between samples from the same condition is expected to generate a uniform distribution of P values. Indeed, the P value density plots, stratified by read count quartiles, have a uniform distribution. However, at the common significance range of ≤ 0.05 there is a noticeable increase in P value densities in Cuffdiff results indicating larger than expected false DE genes. The smoothing bandwidth was fixed at 0.0065 for all density plots and 25% was the lowest gene read count quartile.
Figure 4
Comparison of signal-to-noise ratio and differential expression (DE) for genes expressed in only one condition. (a) The correlation between signal-to-noise and -log10(P) was used to evaluate the accuracy of DE among genes expressed in one condition. A total of 10,272 genes was exclusively expressed in only one of the contrasting conditions in the DE analysis between the three ENCODE datasets. Gray shaded points indicate genes with adjusted P value ≥ 0.05, which are typically considered not differentially expressed. The results show that Cuffdiff, edgeR and DESeq do not properly account for variance in measurements as indicated by poor agreement with the isotonic regression line. (b) ROC curves for detection of DE at signal-to-noise ratio of ≥3. AUC: area under curve.
Figure 5
False positive rates and sensitivity of differential expression (DE) with sequencing depth and number of replicate samples. Differentially expressed genes in GM12892 vs MCF-7 cell lines were divided into four count quartiles and false positive rate and sensitivity were measured by decreasing sequence counts and changing the number of replicate samples. Points and bars are average and standard deviation, respectively, from five random samples of reads from each library; see Materials and methods for details. (a) Number of false positives defined as the number of DE detected genes in GM12892 vs MCF-7 that were only identified by the specific method. (b) Sensitivity rates defined as the fraction of true set genes. Note that PoissonSeq's maximum sensitivity is below 1 since it was not included in the definition of the true set. See Figures S9 to S15 in Additional file 1 for similar plots for DE between other cell lines and technical replicates. DE, differential expression; FP, false positive.
Comment in
- Do count-based differential expression methods perform poorly when genes are expressed in only one condition?
Zhou X, Robinson MD. Zhou X, et al. Genome Biol. 2015 Oct 8;16:222. doi: 10.1186/s13059-015-0781-3. Genome Biol. 2015. PMID: 26450178 Free PMC article. - Response to Zhou and Robinson.
Betel D, Socci ND, Khanin R, Mason CE, Rapaport F. Betel D, et al. Genome Biol. 2015 Oct 8;16:223. doi: 10.1186/s13059-015-0782-2. Genome Biol. 2015. PMID: 26450418 Free PMC article. No abstract available.
Similar articles
- Do count-based differential expression methods perform poorly when genes are expressed in only one condition?
Zhou X, Robinson MD. Zhou X, et al. Genome Biol. 2015 Oct 8;16:222. doi: 10.1186/s13059-015-0781-3. Genome Biol. 2015. PMID: 26450178 Free PMC article. - RNA-Seq Data Analysis in Galaxy.
Batut B, van den Beek M, Doyle MA, Soranzo N. Batut B, et al. Methods Mol Biol. 2021;2284:367-392. doi: 10.1007/978-1-0716-1307-8_20. Methods Mol Biol. 2021. PMID: 33835453 - Response to Zhou and Robinson.
Betel D, Socci ND, Khanin R, Mason CE, Rapaport F. Betel D, et al. Genome Biol. 2015 Oct 8;16:223. doi: 10.1186/s13059-015-0782-2. Genome Biol. 2015. PMID: 26450418 Free PMC article. No abstract available. - From RNA-seq reads to differential expression results.
Oshlack A, Robinson MD, Young MD. Oshlack A, et al. Genome Biol. 2010;11(12):220. doi: 10.1186/gb-2010-11-12-220. Epub 2010 Dec 22. Genome Biol. 2010. PMID: 21176179 Free PMC article. Review. - Differential Expression Analysis of RNA-seq Reads: Overview, Taxonomy, and Tools.
Chowdhury HA, Bhattacharyya DK, Kalita JK. Chowdhury HA, et al. IEEE/ACM Trans Comput Biol Bioinform. 2020 Mar-Apr;17(2):566-586. doi: 10.1109/TCBB.2018.2873010. Epub 2018 Oct 1. IEEE/ACM Trans Comput Biol Bioinform. 2020. PMID: 30281477 Review.
Cited by
- Best practices for differential accessibility analysis in single-cell epigenomics.
Teo AYY, Squair JW, Courtine G, Skinnider MA. Teo AYY, et al. Nat Commun. 2024 Oct 11;15(1):8805. doi: 10.1038/s41467-024-53089-5. Nat Commun. 2024. PMID: 39394227 Free PMC article. - Meta-analysis of RNA interaction profiles of RNA-binding protein using the RBPInper tool.
Cogan JA, Benova N, Kuklinkova R, Boyne JR, Anene CA. Cogan JA, et al. Bioinform Adv. 2024 Aug 26;4(1):vbae127. doi: 10.1093/bioadv/vbae127. eCollection 2024. Bioinform Adv. 2024. PMID: 39233897 Free PMC article. - Exploring the biological behavior differences between retroperitoneal and non-retroperitoneal liposarcomas.
Xi Z, Zhuang A, Li X, Ming TM, Cheng Y, Zhang C, Xie F, Wang Y, Yan G, Zheng J, Lin Z, Zhang G, Li H, Wu T, He Q, Li W. Xi Z, et al. Heliyon. 2024 Jul 19;10(15):e34878. doi: 10.1016/j.heliyon.2024.e34878. eCollection 2024 Aug 15. Heliyon. 2024. PMID: 39157358 Free PMC article. - A computational study of gene expression patterns in head and neck squamous cell carcinoma using TCGA data.
Rauf S, Ullah S, Abid MA, Ullah A, Khan G, Khan AU, Ahmad G, Ijaz M, Ahmad S, Faisal S. Rauf S, et al. Future Sci OA. 2024 Dec 31;10(1):2380590. doi: 10.1080/20565623.2024.2380590. Epub 2024 Aug 14. Future Sci OA. 2024. PMID: 39140365 Free PMC article. - Transcriptomic response to nitrogen availability reveals signatures of adaptive plasticity during tetraploid wheat domestication.
Pieri A, Beleggia R, Gioia T, Tong H, Di Vittori V, Frascarelli G, Bitocchi E, Nanni L, Bellucci E, Fiorani F, Pecchioni N, Marzario S, De Quattro C, Limongi AR, De Vita P, Rossato M, Schurr U, David JL, Nikoloski Z, Papa R. Pieri A, et al. Plant Cell. 2024 Sep 3;36(9):3809-3823. doi: 10.1093/plcell/koae202. Plant Cell. 2024. PMID: 39056474 Free PMC article.
References
- Berger MF, Levin JZ, Vijayendran K, Sivachenko A, Adiconis X, Maguire J, Johnson LA, Robinson J, Verhaak RG, Sougnez C, Onofrio RC, Ziaugra L, Cibulskis K, Laine E, Barretina J, Winckler W, Fisher DE, Getz G, Meyerson M, Jaffe DB, Gabriel SB, Lander ES, Dummer R, Gnirke A, Nusbaum C, Garraway LA. Integrative analysis of the melanoma transcriptome. Genome Res. 2010;14:413–27. doi: 10.1101/gr.103697.109. - DOI - PMC - PubMed
- Young MD, McCarthy DJ, Wakefield MJ, Smyth GK, Oshlack A, Robinson MD. In: Bioinformatics for High Throughput Sequencing. Rodríguez-Ezpeleta N, Hackenberg M, Aransay AM, editor. New York: Springer; 2012. Differential expression for RNA sequencing (RNA-Seq) data: mapping, summarization, statistical analysis, and experimental design. pp. 169–90.
Publication types
MeSH terms
Substances
Grants and funding
- P50 CA091629/CA/NCI NIH HHS/United States
- 2P01CA129243-06/CA/NCI NIH HHS/United States
- R01 NS076465/NS/NINDS NIH HHS/United States
- P30 CA008748/CA/NCI NIH HHS/United States
- P50 CA140146/CA/NCI NIH HHS/United States
LinkOut - more resources
Full Text Sources
Other Literature Sources