Deep sequencing-based expression analysis shows major advances in robustness, resolution and inter-lab portability over five microarray platforms - PubMed (original) (raw)

Comparative Study

Deep sequencing-based expression analysis shows major advances in robustness, resolution and inter-lab portability over five microarray platforms

Peter A C 't Hoen et al. Nucleic Acids Res. 2008 Dec.

Abstract

The hippocampal expression profiles of wild-type mice and mice transgenic for deltaC-doublecortin-like kinase were compared with Solexa/Illumina deep sequencing technology and five different microarray platforms. With Illumina's digital gene expression assay, we obtained approximately 2.4 million sequence tags per sample, their abundance spanning four orders of magnitude. Results were highly reproducible, even across laboratories. With a dedicated Bayesian model, we found differential expression of 3179 transcripts with an estimated false-discovery rate of 8.5%. This is a much higher figure than found for microarrays. The overlap in differentially expressed transcripts found with deep sequencing and microarrays was most significant for Affymetrix. The changes in expression observed by deep sequencing were larger than observed by microarrays or quantitative PCR. Relevant processes such as calmodulin-dependent protein kinase activity and vesicle transport along microtubules were found affected by deep sequencing but not by microarrays. While undetectable by microarrays, antisense transcription was found for 51% of all genes and alternative polyadenylation for 47%. We conclude that deep sequencing provides a major advance in robustness, comparability and richness of expression profiling data and is expected to boost collaborative, comparative and integrative genomics studies.

PubMed Disclaimer

Figures

Figure 1.

Figure 1.

Categorization and abundance of tags. Distribution (in percentage of total) of unique tags (black bars) and individual reads (counts; open bars) over different categories (average from eight samples): high-confidence transcripts (canonical), low-confidence transcripts (noncanonical), mitochondrial RNA (mito), ribosomal RNA (ribo), genomic region with no evidence for transcription (just genome), repetitive genomic region (repeats) and tags with no hits in the genome.

Figure 2.

Figure 2.

Volcano plot of canonical tags. For every tag, the ratio in expression levels of transgenic over wild-type mice (2log scale, _x_-axis) is plotted against the Bayesian error rate (10log scale, _y_-axis). The horizontal line indicates the significance threshold applied, the 3179 differentially expressed tags being above that line. The plot shows that the tags with highest average differences between trasngenic and wild-type mice (far left and right part of the plot) are not all significant (due to large intragroup variation). The most significant tags (top of the plot) generally display small differences in expression between transgenic and wild-type but are, due to relatively high expression levels, very accurately measured and therefore display low intragroup variation.

Figure 3.

Figure 3.

Correlation between absolute expression level (DGE) and microarrays signal intensity. Correlation of the tag abundance (square root transformed; _x_-axis) and intensities [normalized as described in (9)] on the five microarray platforms (_y_-axis) for matching ENSEMBL transcripts, for wild-type sample 1. Pearson correlations are indicated in the graphs. ABI: Applied Biosystems; AFF: Affymetrix; ILL: Illumina; AGL: Agilent; LGTC: home-spotted long oligonucleotide arrays.

Figure 4.

Figure 4.

Assessment of precision and accuracy of DGE. (A) Samples from the wild-type and transgenic pools were sequenced in three different lanes. We calculated the three possible independent log ratios between transgenic and wild-type samples (technical replicates). As a measure of precision, we determined the pair-wise differences between these technical replicates. The distribution of these differences is plotted as a density function (black line). This is also done for three technical replicates of wild-type over transgenic ratios determined on Agilent (red) and home-spotted (blue) microarrays. We balanced the number of observations per platform through random selection of 21 886 features. (B) As a measure of accuracy, we correlated logged ratios of the expression in transgenic versus wild-type mice as obtained by DGE (_x_-axis) against those obtained by qPCR (_y_-axis). All data and primer sequences can be found in

Supplementary Table 3

.

Similar articles

Cited by

References

    1. Irizarry RA, Warren D, Spencer F, Kim IF, Biswal S, Frank BC, Gabrielson E, Garcia JG, Geoghegan J, Germino G, et al. Multiple-laboratory comparison of microarray platforms. Nat. Methods. 2005;2:345–350. - PubMed
    1. Harbers M, Carninci P. Tag-based approaches for transcriptome research and genome annotation. Nat. Methods. 2005;2:495–502. - PubMed
    1. Brenner S, Johnson M, Bridgham J, Golda G, Lloyd DH, Johnson D, Luo S, McCurdy S, Foy M, Ewan M, et al. Gene expression analysis by massively parallel signature sequencing (MPSS) on microbead arrays. Nat. Biotechnol. 2000;18:630–634. - PubMed
    1. Margulies M, Egholm M, Altman WE, Attiya S, Bader JS, Bemben LA, Berka J, Braverman MS, Chen YJ, Chen Z, et al. Genome sequencing in microfabricated high-density picolitre reactors. Nature. 2005;437:376–380. - PMC - PubMed
    1. Shendure J, Porreca GJ, Reppas NB, Lin X, McCutcheon JP, Rosenbaum AM, Wang MD, Zhang K, Mitra RD, Church GM. Accurate multiplex polony sequencing of an evolved bacterial genome. Science. 2005;309:1728–1732. - PubMed

Publication types

MeSH terms

Substances

LinkOut - more resources