Comprehensive evaluation of differential gene expression analysis methods for RNA-seq data - PubMed (original) (raw)

Comprehensive evaluation of differential gene expression analysis methods for RNA-seq data

Franck Rapaport et al. Genome Biol. 2013.

Erratum in

Erratum to: Comprehensive evaluation of differential gene expression analysis methods for RNA-seq data.
Rapaport F, Khanin R, Liang Y, Pirun M, Krek A, Zumbo P, Mason CE, Socci ND, Betel D. Rapaport F, et al. Genome Biol. 2015 Nov 23;16:261. doi: 10.1186/s13059-015-0813-z. Genome Biol. 2015. PMID: 26597945 Free PMC article. No abstract available.

Abstract

A large number of computational methods have been developed for analyzing differential gene expression in RNA-seq data. We describe a comprehensive evaluation of common methods using the SEQC benchmark dataset and ENCODE data. We consider a number of key features, including normalization, accuracy of differential expression detection and differential expression analysis when one condition has no detectable expression. We find significant differences among the methods, but note that array-based methods adapted to RNA-seq data perform comparably to methods designed for RNA-seq. Our results demonstrate that increasing the number of replicate samples significantly improves detection power over increased sequencing depth.

PubMed Disclaimer

Figures

Figure 1

RMSD correlation between qRT-PCR and RNA-seq log2 expression changes computed by each method. Overall, there is good concordance between log2 values derived from the DE methods and the experimental values derived from qRT-PCR measures. Upper quartile normalization implemented in baySeq package is least correlated with qRT-PCR values. DE, differential expression; RMSD, root-mean-square deviation.

Figure 2

Differential expression analysis using qRT-PCR validated gene set. (a) ROC analysis was performed using a qRT-PCR log2 expression change threshold of 0.5. The results show a slight advantage for DESeq and edgeR in detection accuracy. (b) At increasing log2 expression ratios (incremented by 0.1), representing a more stringent cutoff for differential expression, the performances of the Cuffdiff and limma methods gradually reduce whereas PoissonSeq performance increases. AUC, area under the curve.

Figure 3

P value distributions by gene read count quantiles from null model evaluations. Null model comparison where differential expression (DE) is evaluated between samples from the same condition is expected to generate a uniform distribution of P values. Indeed, the P value density plots, stratified by read count quartiles, have a uniform distribution. However, at the common significance range of ≤ 0.05 there is a noticeable increase in P value densities in Cuffdiff results indicating larger than expected false DE genes. The smoothing bandwidth was fixed at 0.0065 for all density plots and 25% was the lowest gene read count quartile.

Figure 4

Comparison of signal-to-noise ratio and differential expression (DE) for genes expressed in only one condition. (a) The correlation between signal-to-noise and -log10(P) was used to evaluate the accuracy of DE among genes expressed in one condition. A total of 10,272 genes was exclusively expressed in only one of the contrasting conditions in the DE analysis between the three ENCODE datasets. Gray shaded points indicate genes with adjusted P value ≥ 0.05, which are typically considered not differentially expressed. The results show that Cuffdiff, edgeR and DESeq do not properly account for variance in measurements as indicated by poor agreement with the isotonic regression line. (b) ROC curves for detection of DE at signal-to-noise ratio of ≥3. AUC: area under curve.

Figure 5

False positive rates and sensitivity of differential expression (DE) with sequencing depth and number of replicate samples. Differentially expressed genes in GM12892 vs MCF-7 cell lines were divided into four count quartiles and false positive rate and sensitivity were measured by decreasing sequence counts and changing the number of replicate samples. Points and bars are average and standard deviation, respectively, from five random samples of reads from each library; see Materials and methods for details. (a) Number of false positives defined as the number of DE detected genes in GM12892 vs MCF-7 that were only identified by the specific method. (b) Sensitivity rates defined as the fraction of true set genes. Note that PoissonSeq's maximum sensitivity is below 1 since it was not included in the definition of the true set. See Figures S9 to S15 in Additional file 1 for similar plots for DE between other cell lines and technical replicates. DE, differential expression; FP, false positive.

Comment in

Do count-based differential expression methods perform poorly when genes are expressed in only one condition?
Zhou X, Robinson MD. Zhou X, et al. Genome Biol. 2015 Oct 8;16:222. doi: 10.1186/s13059-015-0781-3. Genome Biol. 2015. PMID: 26450178 Free PMC article.
Response to Zhou and Robinson.
Betel D, Socci ND, Khanin R, Mason CE, Rapaport F. Betel D, et al. Genome Biol. 2015 Oct 8;16:223. doi: 10.1186/s13059-015-0782-2. Genome Biol. 2015. PMID: 26450418 Free PMC article. No abstract available.

Cited by

Best practices for differential accessibility analysis in single-cell epigenomics.
Teo AYY, Squair JW, Courtine G, Skinnider MA. Teo AYY, et al. Nat Commun. 2024 Oct 11;15(1):8805. doi: 10.1038/s41467-024-53089-5. Nat Commun. 2024. PMID: 39394227 Free PMC article.
Meta-analysis of RNA interaction profiles of RNA-binding protein using the RBPInper tool.
Cogan JA, Benova N, Kuklinkova R, Boyne JR, Anene CA. Cogan JA, et al. Bioinform Adv. 2024 Aug 26;4(1):vbae127. doi: 10.1093/bioadv/vbae127. eCollection 2024. Bioinform Adv. 2024. PMID: 39233897 Free PMC article.
Exploring the biological behavior differences between retroperitoneal and non-retroperitoneal liposarcomas.
Xi Z, Zhuang A, Li X, Ming TM, Cheng Y, Zhang C, Xie F, Wang Y, Yan G, Zheng J, Lin Z, Zhang G, Li H, Wu T, He Q, Li W. Xi Z, et al. Heliyon. 2024 Jul 19;10(15):e34878. doi: 10.1016/j.heliyon.2024.e34878. eCollection 2024 Aug 15. Heliyon. 2024. PMID: 39157358 Free PMC article.
A computational study of gene expression patterns in head and neck squamous cell carcinoma using TCGA data.
Rauf S, Ullah S, Abid MA, Ullah A, Khan G, Khan AU, Ahmad G, Ijaz M, Ahmad S, Faisal S. Rauf S, et al. Future Sci OA. 2024 Dec 31;10(1):2380590. doi: 10.1080/20565623.2024.2380590. Epub 2024 Aug 14. Future Sci OA. 2024. PMID: 39140365 Free PMC article.
Transcriptomic response to nitrogen availability reveals signatures of adaptive plasticity during tetraploid wheat domestication.
Pieri A, Beleggia R, Gioia T, Tong H, Di Vittori V, Frascarelli G, Bitocchi E, Nanni L, Bellucci E, Fiorani F, Pecchioni N, Marzario S, De Quattro C, Limongi AR, De Vita P, Rossato M, Schurr U, David JL, Nikoloski Z, Papa R. Pieri A, et al. Plant Cell. 2024 Sep 3;36(9):3809-3823. doi: 10.1093/plcell/koae202. Plant Cell. 2024. PMID: 39056474 Free PMC article.

References

1. Mortazavi A, Williams BA, McCue K, Schaeffer L, Wold B. Mapping and quantifying mammalian transcriptomes by RNA-Seq. Nat Methods. 2008;14:621–8. doi: 10.1038/nmeth.1226. - DOI - PubMed
1. Berger MF, Levin JZ, Vijayendran K, Sivachenko A, Adiconis X, Maguire J, Johnson LA, Robinson J, Verhaak RG, Sougnez C, Onofrio RC, Ziaugra L, Cibulskis K, Laine E, Barretina J, Winckler W, Fisher DE, Getz G, Meyerson M, Jaffe DB, Gabriel SB, Lander ES, Dummer R, Gnirke A, Nusbaum C, Garraway LA. Integrative analysis of the melanoma transcriptome. Genome Res. 2010;14:413–27. doi: 10.1101/gr.103697.109. - DOI - PMC - PubMed
1. Wang Z, Gerstein M, Snyder M. RNA-Seq: a revolutionary tool for transcriptomics. Nat Rev Genet. 2009;14:57–63. doi: 10.1038/nrg2484. - DOI - PMC - PubMed
1. Young MD, McCarthy DJ, Wakefield MJ, Smyth GK, Oshlack A, Robinson MD. In: Bioinformatics for High Throughput Sequencing. Rodríguez-Ezpeleta N, Hackenberg M, Aransay AM, editor. New York: Springer; 2012. Differential expression for RNA sequencing (RNA-Seq) data: mapping, summarization, statistical analysis, and experimental design. pp. 169–90.
1. Trapnell C, Hendrickson DG, Sauvageau M, Goff L, Rinn JL, Pachter L. Differential analysis of gene regulation at transcript resolution with RNA-seq. Nat Biotechnol. 2013;14:46–53. - PMC - PubMed

Comprehensive evaluation of differential gene expression analysis methods for RNA-seq data - PubMed (original) (raw)