A comprehensive assessment of RNA-seq accuracy, reproducibility and information content by the Sequencing Quality Control Consortium - PubMed (original) (raw)

A comprehensive assessment of RNA-seq accuracy, reproducibility and information content by the Sequencing Quality Control Consortium

SEQC/MAQC-III Consortium. Nat Biotechnol. 2014 Sep.

Abstract

We present primary results from the Sequencing Quality Control (SEQC) project, coordinated by the US Food and Drug Administration. Examining Illumina HiSeq, Life Technologies SOLiD and Roche 454 platforms at multiple laboratory sites using reference RNA samples with built-in controls, we assess RNA sequencing (RNA-seq) performance for junction discovery and differential expression profiling and compare it to microarray and quantitative PCR (qPCR) data using complementary metrics. At all sequencing depths, we discover unannotated exon-exon junctions, with >80% validated by qPCR. We find that measurements of relative expression are accurate and reproducible across sites and platforms if specific filters are used. In contrast, RNA-seq and microarrays do not provide accurate absolute measurements, and gene-specific biases are observed for all examined platforms, including qPCR. Measurement performance depends on the platform and data analysis pipeline, and variation is large for transcript-level profiling. The complete SEQC data sets, comprising >100 billion reads (10Tb), provide unique resources for evaluating RNA-seq analyses for clinical and regulatory settings.

PubMed Disclaimer

Conflict of interest statement

COMPETING FINANCIAL INTERESTS

Some of the SEQC (MAQC-III) consortium members are employed by companies that provide services or manufacture products or equipment related to gene expression profiling, as can be seen from the provided affiliations of the manuscript authors.

Figures

Figure 1

Figure 1

The SEQC (MAQC-III) project and experimental design. (a) Overview of projects. We report on a group of studies assessing different sequencing platforms in real-world use cases, including transcriptome annotation and other research applications, as well as clinical settings. This paper focuses on the results of a multi-center experiment with built-in ground truths. (b) Main study design. Similar to the MAQC-I benchmarks, we analysed RNA samples A to D: Samples C and D were created by mixing the well-characterized samples A and B in 3:1 and 1:3 ratios, respectively. This allows tests for titration consistency (c) and the correct recovery of the known mixing ratios (d). Synthetic RNAs from the External RNA Control Consortium (ERCC) were both pre-added to samples A and B before mixing and also sequenced separately to assess dynamic range (samples E and F). Samples were distributed to independent sites for RNA-seq library construction and profiling by Illumina’s HiSeq 2000 (3 official + 3 inofficial sites) and Life Technologies’ SOLiD 5500 (3 official sites + 1 inofficial site). Unless mentioned otherwise, data presented shows results from the three official sites (italics). In addition to the four replicate libraries each for samples A to D per site, for each platform, one vendor-prepared library A5…D5 was being sequenced at the official sites, giving a total of 120 libraries. At each site, every library has a unique bar-code sequence and all libraries were pooled before sequencing, so each lane was sequencing the same material, allowing a study of lane specific effects. To support a later assessment of gene models, we sequenced samples A and B by Roche 454 (3x, no replicates, see Supplementary Notes Section 2.5). (c) Schema illustrating tests for titration order consistency. Four examples are shown. The dashed lines represent the ideal mixture of samples A and B (blue and red) expected for samples D and C (magenta and dark purple). (d) Schema illustrating a consistency test for recovering the expected sample mixing ratio. The yellow lines mark a 10% deviation from the expected response (black) for a perfect mixing ratio. Both tests (c) and (d) will reflect both systemic distortions (bias) and random variation (noise).

Figure 2

Figure 2

Gene detection and junction discovery. (a) The fraction of all reads aligned to gene models from different annotations, RefSeq, Encode and NCBI AceView (green). Reads aligning only to specific annotations are shown in dark green. (b) Known genes (left) and exon junctions (right) supported by at least 16 HiSeq 2000 or SOLiD reads are in green; genes or junctions annotated but not observed at this threshold are shown in grey. (ce) show sensitivity as a function of read depth. (c) Known genes detected. We show the number and percentage of all AceView annotated genes detected for three RNA-seq analysis pipelines, Subread (yellow), r-make (cyan) and Magic (magenta). The _x_-axis marks cumulative aligned fragments from all replicates and sites. Vertical lines indicate boundaries between samples A through D. (d) Known junctions detected. The numbers and percentages of all exon-exon junctions (supported by 8 or more reads) are shown for different gene model databases (line style). Horizontal lines show the respective total numbers of annotated junctions. (e) Unannotated junctions supported by multiple platforms and pipelines. Subsets of unannotated junctions have expression levels with correct titration orders and mixing ratios (_cf._Figs 1b–d and 4a,b). (f) Distribution of junction expression levels. Unannotated junctions, then unannotated junctions supported by multiple platforms and pipelines, and known junctions show increasing expression levels (colors). Subsets expressed with mutual information about the samples and correct titration order and mixing ratio display a further shift towards higher expression levels (dashed lines). (g, h) Intra- (blue) and inter-site reproducibility (orange) of detected known genes (g) and junctions (h). Pairwise agreement is shown by boxplots, where the second set of boxplots (upper group) indicates percentages.

Figure 3

Figure 3

Sensitivity, specificity and reproducibility of differential expression calls. Robust cross-site analyses depend on pipeline choice and appropriate filter rules. Results are shown for five MAQC-III RNA-seq pipelines and the MAQC-I Affymetrix microarray platform (color). Panels (a and b) show results without filtering, panels (c to e) show results using appropriate filters. (a and c) Number of standardized differential expression calls. All possible pairwise inter-site A versus A comparisons (●) are shown next to all intra-site A versus B comparisons (x) as indicators for specificity and sensitivity. (b and d) Ratios of A versus A calls and A versus B calls give an estimate of the false discovery rate (eFDR). For all platforms and pipelines, differential expression calls identify thousands of differences in inter-site comparisons of identical samples. These can be controlled for microarray by additional filters for effect size. In addition, RNA-seq also requires filters for expression strength due to the high sampling fluctuations at lower read counts. These were set to give similar numbers of A versus B expression calls (b), improving the eFDR to <1.5% except for several outliers. (e) Inter-site reproducibility of differential expression calls. Comparing the identities and the directions of change for differentially expressed genes (DEGs) across sites, agreement is plotted for lists including the top-ranked genes as sorted by effect size (x-axis). The observed response curves depend on pipeline and filter choice, showing more variation for shorter lists. The performance of several RNA-seq pipelines was comparable to that of differential expression profiling using the microarray measurements from the MAQC-I study, with the microarrays showing higher inter-site reproducibility when considering only the differentially expressed genes with the strongest fold change (left side of e).

Figure 4

Figure 4

Built-in truths for assessing RNA-seq. (a) Titration order A, C, D, B. Log2 fold-change is related to cross-platform titration consistency. At sufficiently strong log2 fold-change, reliable titration is also found across platforms: The dark blue line represents the 22,074 ‘unmissable’ genes showing the correct titration order with no contradiction in at least 14 HiSeq 2000 and 6 SOLiD samples. Most genes with high differential expression are in this class. (b) Known A/B mixing ratios in samples C and D. The yellow solid line traces the expected values after mRNA/total-RNA shift correction. The 1%, 10% and 25% most highly expressed genes are shown in red, cyan and magenta, respectively. On average, the most strongly expressed genes recover the expected mixing ratio best. Genes with inconsistent titration (cf. a) are colored grey. Black and grey symbols intermixing indicates that consistent titration (black) does not guarantee reliable recovery of the mixing ratio (and vice versa). (c) ERCC spike-in ratios can be recovered increasingly well at higher expression levels. From the response curves, one can calculate signal thresholds for the detection of a change. (d) Variation of the total amounts of detected ERCC spikes. The lack of reliable titration indicates that the considerable differences between libraries of a given site and protocol are random, implying limits for absolute expression level estimates, in general, and using spike-ins for the calibration of absolute quantification, in particular. The observed variations likely arise in library construction, as the vendor-prepared libraries (colored cyan or grey) gave constant results across different sites. For (a) and (b), all 55,674 AceView genes tested.

Figure 5

Figure 5

Cross-platform agreement of expression levels. (a) Comparison of log2 fold-change estimates for 843 selected genes. Good and similar concordances were observed between relative expression measures from the MAQC-III HiSeq 2000 and SOLiD sequencing platforms, MAQC-I TaqMan and the MAQC-III Affymetrix HuGene2 arrays (Pearson and Spearman correlation coefficients are shown; _cf._Supplementary Figure 22). (b) Comparison of absolute expression levels from HiSeq 2000 and SOLiD in a rank scatter density plot. Expression level ranks for sample A are shown on the _x_-axis for HiSeq 2000, and on the _y_-axis for SOLiD. Genes are represented by dots, and areas with several genes are shown in blue, with darker blue corresponding to a higher gene density in the area. Large cross-platform deviations are seen even for highly expressed genes and these variations are systematic. The genes in the vertical ‘spur’, for instance, are not detected by SOLiD RNA-seq but show strong expression levels in HiSeq 2000 RNA-seq, with an analog comparison to 20,801 qPCR measurements giving a similar picture (Supplementary Figure 25). The ERCC spike-ins are shown as red symbols (+). ERCC spike-in signals are systematically lower in the HiSeq 2000 data, which may be explained by their shorter poly-A tails and differences in the library construction protocols. (c) The same plot as (b) but removing the 11,066 genes that can be affected by the non-stranded nature of the applied Illumina protocol. Although the actual number of genes in the vertical spur that are not detected by SOLiD but show strong expression levels in the HiSeq 2000 is now smaller, it is still substantial. (d) Comparison of TaqMan and PrimePCR for 843 selected genes. Expression estimates vary considerably for individual genes, with some genes showing high expression in one platform but are not detected at all by the other.

Figure 6

Figure 6

Multiple performance metrics for the quantification of genes and alternative transcripts. The _y_-axes show a Consistency Score. Secondary _y_-axes mark the percentage of the maximal possible score. Panels show the three official HiSeq 2000 and SOLiD sites and compare a few analysis variants: Green, TopHat2; magenta, TopHat2 guided by known gene models; cyan, Subread; yellow, BitSeq; blue, Magic. Panels a and b consider all AceView annotated genes. Panels c and d focus on a subset of expressed complex genes with multiple alternative transcripts where comparison to a high-resolution test microarray (rightmost bar) can be conducted. (e) Comparison of RNA-seq to four different microarrays and data-processing methods (red bars) by plotting the mutual information (_y_-axes) at different read depths (_x_-axes). For the microarrays, the number of probes used is shown. The numbers given for RNA-seq state the number of fragments mapped to genes as well as the [total fragments]. SOLiD and HiSeq 2000 performed similarly well for comparable effective read depths (Supplementary Figure 33a). HiSeq 2000 data is plotted here. Each bar shows the minima and maxima across the three official sites. The read depth for which average RNA-seq performance met or exceeded that of the array is marked by a cyan bar. The corresponding read depths varied widely from 5 M (HGU133plus2 with MAS5) to about 50 M fragments (PrimeView with gcRMA/affyPLM), showing the strong effect of the reference gene set implied by the probes on the respective arrays and the employed microarray data-processing methods. Results are shown for the Subread pipeline. Alternative RNA-seq data analysis pipelines, however, can require up to double the number of fragments (TopHat2+Cufflinks, Supplementary Figure 35). See Supplementary Figures 33 and 34 for comparisons of other platforms and read depths.

Similar articles

Cited by

References

    1. Wang ET, et al. Alternative isoform regulation in human tissue transcriptomes. Nature. 2008;456:470–476. - PMC - PubMed
    1. Mortazavi A, Williams BA, McCue K, Schaeffer L, Wold B. Mapping and quantifying mammalian transcriptomes by RNA-seq. Nat Methods. 2008;5:621–628. - PubMed
    1. Łabaj PP, et al. Characterization and improvement of RNA-Seq precision in quantitative transcript expression profiling. Bioinformatics. 2011;27:i383–i391. - PMC - PubMed
    1. Liu S, Lin L, Jiang P, Wang D, Xing Y. A comparison of RNA-Seq and high-density exon array for detecting differential gene expression between closely related species. Nucleic Acids Res. 2011;39:578–588. - PMC - PubMed
    1. McIntyre LM, et al. RNA-seq: technical variability and sampling. BMC Genomics. 2011;12:293. - PMC - PubMed

Publication types

MeSH terms

Grants and funding

LinkOut - more resources