QoRTs: a comprehensive toolset for quality control and data processing of RNA-Seq experiments - PubMed (original) (raw)

QoRTs: a comprehensive toolset for quality control and data processing of RNA-Seq experiments

Stephen W Hartley et al. BMC Bioinformatics. 2015.

Abstract

Background: High-throughput next-generation RNA sequencing has matured into a viable and powerful method for detecting variations in transcript expression and regulation. Proactive quality control is of critical importance as unanticipated biases, artifacts, or errors can potentially drive false associations and lead to flawed results.

Results: We have developed the Quality of RNA-Seq Toolset, or QoRTs, a comprehensive, multifunction toolset that assists in quality control and data processing of high-throughput RNA sequencing data.

Conclusions: QoRTs generates an unmatched variety of quality control metrics, and can provide cross-comparisons of replicates contrasted by batch, biological sample, or experimental condition, revealing any outliers and/or systematic issues that could drive false associations or otherwise compromise downstream analyses. In addition, QoRTs simultaneously replaces the functionality of numerous other data-processing tools, and can quickly and efficiently generate quality control metrics, coverage counts (for genes, exons, and known/novel splice-junctions), and browser tracks. These functions can all be carried out as part of a single unified data-processing/quality control run, greatly reducing both the complexity and the total runtime of the analysis pipeline. The software, source code, and documentation are available online at http://hartleys.github.io/QoRTs.

PubMed Disclaimer

Figures

Fig. 1

Fig. 1

An example analysis pipeline with QoRTs. This flowchart illustrates the recommended analysis pipeline for conventional RNA-Seq analysis using QoRTs. Input and intermediary files are shown in blue, output files and results are shown in purple

Fig. 2

Fig. 2

A small selection of the QC plots offered by QoRTs. This series includes 12 samples, each consisting of 6 technical replicates (for a total of 72 bam files), with 4 different biological conditions (3 samples per condition). In all nine plots, replicates are colored and differentiated by biological group. In the line plots (c,d,e, and f) the samples are simply colored by biological group. In other plots (a and g), replicates are differentiated by character, color, and horizontal offset. This differentiation allows easy identification of both outliers and systematic biases or errors associated with the biological condition. Such systematic errors are of particular importance as they could potentially drive false associations. A full description of each plot and its interpretation can be found in the supplementary materials

Fig. 3

Fig. 3

Example issue detected via QoRTs. A subset of the output plots from a dataset in which a rare hardware-level fault produced an actionable QC issue that can be easily identified via QoRTs. In (a) and (b) the replicates are colored by biological sample; in (c) and (d) replicates are colored by sequencer lane. See the QoRTs vignette for more information (Additional file 1)

Similar articles

Cited by

References

    1. Wang Z, Gerstein M, Snyder M. RNA-Seq: a revolutionary tool for transcriptomics. Nat Rev Genet. 2009;10(1):57–63. doi: 10.1038/nrg2484. - DOI - PMC - PubMed
    1. Anders S, Huber W. Differential expression analysis for sequence count data. Genome Biol. 2010;11(10):R106. - PMC - PubMed
    1. Robinson MD, Oshlack A. A scaling normalization method for differential expression analysis of RNA-seq data. Genome Biol. 2010;11(3):R25. doi: 10.1186/gb-2010-11-3-r25. - DOI - PMC - PubMed
    1. Robinson MD, Smyth GK. Moderated statistical tests for assessing differences in tag abundance. Bioinformatics. 2007;23(21):2881–7. doi: 10.1093/bioinformatics/btm453. - DOI - PubMed
    1. Hansen KD, Irizarry RA, Wu Z. Removing technical variability in RNA-seq data using conditional quantile normalization. Biostatistics. 2012;13(2):204–16. doi: 10.1093/biostatistics/kxr054. - DOI - PMC - PubMed

Publication types

MeSH terms

Substances

LinkOut - more resources