A benchmark for RNA-seq quantification pipelines - PubMed (original) (raw)
doi: 10.1186/s13059-016-0940-1.
Mingxiang Teng 1 2 3, Carrie A Davis 4, Sarah Djebali 5, Alexander Dobin 4, Brenton R Graveley 6, Sheng Li 7, Christopher E Mason 7, Sara Olson 6, Dmitri Pervouchine 5, Cricket A Sloan 8, Xintao Wei 6, Lijun Zhan 6, Rafael A Irizarry 9 10
Affiliations
- PMID: 27107712
- PMCID: PMC4842274
- DOI: 10.1186/s13059-016-0940-1
A benchmark for RNA-seq quantification pipelines
Mingxiang Teng et al. Genome Biol. 2016.
Erratum in
- Erratum to: A benchmark for RNA-seq quantification pipelines.
Teng M, Love MI, Davis CA, Djebali S, Dobin A, Graveley BR, Li S, Mason CE, Olson S, Pervouchine D, Sloan CA, Wei X, Zhan L, Irizarry RA. Teng M, et al. Genome Biol. 2016 May 23;17(1):107. doi: 10.1186/s13059-016-0986-0. Genome Biol. 2016. PMID: 27215799 Free PMC article. No abstract available. - Erratum to: A benchmark for RNA-seq quantification pipelines.
Teng M, Love MI, Davis CA, Djebali S, Dobin A, Graveley BR, Li S, Mason CE, Olson S, Pervouchine D, Sloan CA, Wei X, Zhan L, Irizarry RA. Teng M, et al. Genome Biol. 2016 Sep 30;17(1):203. doi: 10.1186/s13059-016-1060-7. Genome Biol. 2016. PMID: 27716375 Free PMC article. No abstract available.
Abstract
Obtaining RNA-seq measurements involves a complex data analytical process with a large number of competing algorithms as options. There is much debate about which of these methods provides the best approach. Unfortunately, it is currently difficult to evaluate their performance due in part to a lack of sensitive assessment metrics. We present a series of statistical summaries and plots to evaluate the performance in terms of specificity and sensitivity, available as a R/Bioconductor package ( http://bioconductor.org/packages/rnaseqcomp ). Using two independent datasets, we assessed seven competing pipelines. Performance was generally poor, with two methods clearly underperforming and RSEM slightly outperforming the rest.
Figures
Fig. 1
Estimated log fold changes stratified by transcript abundance on simulation dataset. One example based on Cufflinks quantification of two samples is shown here. Black points are non-differential transcripts; blue points are differentially expressed transcripts which were simulated to have signals on both samples; red points are differentially expressed transcripts which were simulated to have signals in only one of the samples
Fig. 2
Distribution of reported transcript quantifications on one sample of simulation dataset a before and b after rescaling. Seven quantification methods are shown here
Fig. 3
Standard deviations of transcript quantifications based on a an experimental dataset (GM12878) and b a simulation dataset (one of the cell lines). Seven quantification methods are shown here
Fig. 4
Proportions of discordant expression calls based on a an experimental dataset (GM12878) and b a simulation dataset (one of the cell lines). Seven quantification methods are shown here
Fig. 5
Proportion differences of transcript quantifications in genes with only two annotated transcripts based on a an experimental dataset (GM12878) and b a simulation dataset (one of the cell lines). Seven quantification methods are shown
Fig. 6
ROC curves indicating performance of quantification methods based on differential expression analysis of a an experimental dataset and b a simulation dataset. Seven quantification methods are shown. FP false positive, TP true positive
References
- Bray N, Pimentel H, Melsted P, Pachter L. Near-optimal probabilistic RNA-Seq quantification. Nat Biotechnol. 2016 - PubMed
Publication types
MeSH terms
LinkOut - more resources
Full Text Sources
Other Literature Sources