A benchmark for RNA-seq quantification pipelines - PubMed (original) (raw)

doi: 10.1186/s13059-016-0940-1.

Mingxiang Teng 1 2 3, Carrie A Davis 4, Sarah Djebali 5, Alexander Dobin 4, Brenton R Graveley 6, Sheng Li 7, Christopher E Mason 7, Sara Olson 6, Dmitri Pervouchine 5, Cricket A Sloan 8, Xintao Wei 6, Lijun Zhan 6, Rafael A Irizarry 9 10

Affiliations

PMID: 27107712
PMCID: PMC4842274
DOI: 10.1186/s13059-016-0940-1

A benchmark for RNA-seq quantification pipelines

Mingxiang Teng et al. Genome Biol. 2016.

Erratum in

Erratum to: A benchmark for RNA-seq quantification pipelines.
Teng M, Love MI, Davis CA, Djebali S, Dobin A, Graveley BR, Li S, Mason CE, Olson S, Pervouchine D, Sloan CA, Wei X, Zhan L, Irizarry RA. Teng M, et al. Genome Biol. 2016 May 23;17(1):107. doi: 10.1186/s13059-016-0986-0. Genome Biol. 2016. PMID: 27215799 Free PMC article. No abstract available.
Erratum to: A benchmark for RNA-seq quantification pipelines.
Teng M, Love MI, Davis CA, Djebali S, Dobin A, Graveley BR, Li S, Mason CE, Olson S, Pervouchine D, Sloan CA, Wei X, Zhan L, Irizarry RA. Teng M, et al. Genome Biol. 2016 Sep 30;17(1):203. doi: 10.1186/s13059-016-1060-7. Genome Biol. 2016. PMID: 27716375 Free PMC article. No abstract available.

Abstract

Obtaining RNA-seq measurements involves a complex data analytical process with a large number of competing algorithms as options. There is much debate about which of these methods provides the best approach. Unfortunately, it is currently difficult to evaluate their performance due in part to a lack of sensitive assessment metrics. We present a series of statistical summaries and plots to evaluate the performance in terms of specificity and sensitivity, available as a R/Bioconductor package ( http://bioconductor.org/packages/rnaseqcomp ). Using two independent datasets, we assessed seven competing pipelines. Performance was generally poor, with two methods clearly underperforming and RSEM slightly outperforming the rest.

PubMed Disclaimer

Figures

Fig. 1

Estimated log fold changes stratified by transcript abundance on simulation dataset. One example based on Cufflinks quantification of two samples is shown here. Black points are non-differential transcripts; blue points are differentially expressed transcripts which were simulated to have signals on both samples; red points are differentially expressed transcripts which were simulated to have signals in only one of the samples

Fig. 2

Distribution of reported transcript quantifications on one sample of simulation dataset a before and b after rescaling. Seven quantification methods are shown here

Fig. 3

Standard deviations of transcript quantifications based on a an experimental dataset (GM12878) and b a simulation dataset (one of the cell lines). Seven quantification methods are shown here

Fig. 4

Proportions of discordant expression calls based on a an experimental dataset (GM12878) and b a simulation dataset (one of the cell lines). Seven quantification methods are shown here

Fig. 5

Proportion differences of transcript quantifications in genes with only two annotated transcripts based on a an experimental dataset (GM12878) and b a simulation dataset (one of the cell lines). Seven quantification methods are shown

Fig. 6

ROC curves indicating performance of quantification methods based on differential expression analysis of a an experimental dataset and b a simulation dataset. Seven quantification methods are shown. FP false positive, TP true positive

References

1. Consortium EP. The ENCODE (ENCyclopedia Of DNA Elements) Project. Science. 2004;306:636–40. doi: 10.1126/science.1105136. - DOI - PubMed
1. Bray N, Pimentel H, Melsted P, Pachter L. Near-optimal probabilistic RNA-Seq quantification. Nat Biotechnol. 2016 - PubMed
1. Patro R, Mount SM, Kingsford C. Sailfish enables alignment-free isoform quantification from RNA-seq reads using lightweight algorithms. Nat Biotechnol. 2014;32:462–4. doi: 10.1038/nbt.2862. - DOI - PMC - PubMed
1. Norel R, Rice JJ, Stolovitzky G. The self-assessment trap: can we all be better than average? Mol Syst Biol. 2011;7:537. doi: 10.1038/msb.2011.70. - DOI - PMC - PubMed
1. Kanitz A, Gypas F, Gruber AJ, Gruber AR, Martin G, Zavolan M. Comparative assessment of methods for the computational inference of transcript isoform abundance from RNA-seq data. Genome Biol. 2015;16:150. doi: 10.1186/s13059-015-0702-5. - DOI - PMC - PubMed

Publication types

MeSH terms

LinkOut - more resources

Full Text Sources
Other Literature Sources
- scite Smart Citations