Within the fold: assessing differential expression measures and reproducibility in microarray assays - PubMed (original) (raw)

Comparative Study

. 2002 Oct 24;3(11):research0062.

doi: 10.1186/gb-2002-3-11-research0062. Epub 2002 Oct 24.

Emily Chen, Jeremy P Hasseman, Wei Liang, Bryan C Frank, Shuibang Wang, Vasily Sharov, Alexander I Saeed, Joseph White, Jerry Li, Norman H Lee, Timothy J Yeatman, John Quackenbush

Affiliations

Comparative Study

Within the fold: assessing differential expression measures and reproducibility in microarray assays

Ivana V Yang et al. Genome Biol. 2002.

Abstract

Background: 'Fold-change' cutoffs have been widely used in microarray assays to identify genes that are differentially expressed between query and reference samples. More accurate measures of differential expression and effective data-normalization strategies are required to identify high-confidence sets of genes with biologically meaningful changes in transcription. Further, the analysis of a large number of expression profiles is facilitated by a common reference sample, the construction of which must be carefully addressed.

Results: We carried out a series of 'self-self' hybridizations in which aliquots of the same RNA sample were labeled separately with Cy3 and Cy5 fluorescent dyes and co-hybridized to the same microarray. From this, we can analyze the intensity-dependent behavior of microarray data, define a statistically significant measure of differential expression that exploits the structure of the fluorescent signals, and measure the inherent reproducibility of the technique. We also devised a simple procedure for identifying and eliminating low-quality data for replicates within and between slides. We examine the properties required of a universal reference RNA sample and show how pooling a small number of samples with a diverse representation of expressed genes can outperform more complex mixtures as a reference sample.

Conclusion: Analysis of cell-line samples can identify systematic structure in measured gene-expression levels. A general procedure for analyzing cDNA microarray data is proposed and validated. We show that pooled reference samples should be based not only on the expression of individual genes in each cell line but also on the expression levels of genes within cell lines.

PubMed Disclaimer

Figures

Figure 1

Figure 1

Self-self hybridization of the KM12L4A cell line. (a) R-I (ratio-intensity) plot for a self-self hybridization of the KM12L4A cell line before lowess correction. (b) The same dataset, showing the effect of lowess correction (red) relative to the uncorrected data (blue). Lowess removes the intensity-dependent curvature that is evident in the uncorrected data and in the process, reduces the SD in the dataset. (c) Similar plots for all 30 self-self hybridizations performed in this study, including the SD for the dataset before (blue) and after (red) the application of the lowess correction.

Figure 2

Figure 2

A combined histogram of the log2(expression ratio) measured for all array elements across all 30 hybridizations used in this study both before (blue) and after (red) application of lowess correction.

Figure 3

Figure 3

Intensity-dependent calculations of SDs described in the text show distinct patterns that depend on how closely related are the samples being compared. (a) The 'tadpole' pattern seen in the self-self hybridization of RNA samples from the KM12L4A cell line is characteristic of RNA samples derived from similar sources are compared. (b) RNA samples from very different samples show a characteristic 'eye' pattern, with greater diversity of expression for genes expressed at intermediate levels, as seen in this co-hybridization of a Cy5-labeled PA-1 (ovary) with a Cy3-labeled CaCO2 (colon) RNA sample.

Figure 4

Figure 4

Replicate filtering within an array can reduce variability in the data. Scatterplots showing correlation coefficients (r) for the logarithms of the Cy5/Cy3 ratios for duplicate spots within arrays for (a) a self-self hybridization of RNA samples from the CaOV3 cell line and (b) a co-hybridization of a Cy5-labeled PA-1 with a Cy3-labeled CaCO2 RNA sample. In both cases, data before replicate filtering (blue) includes a number of outliers that are eliminated from the filtered data (red), resulting in a much better correlation between duplicate measurements

Figure 5

Figure 5

Replicate filtering between slides can also significantly improve data quality. Scatterplots showing correlation coefficients (r) for the logarithms of the Cy5/Cy3 ratios for duplicate spots within arrays for three arrays used to analyze independently labeled sets of (a) CaOV3 RNA samples (self-self hybridizations) and (b) PA-1 (Cy5) and CaCO2 (Cy3) RNA samples.

Figure 6

Figure 6

An ideal reference RNA sample will provide detectable hybridization above background for as broad as possible representation of the arrayed genes. The histogram shows the percentage of array elements with detectable signals in both the Cy3 and Cy5 channels for a series of self-self hybridizations representing all of the primary cell lines used in this study, the Stratagene universal reference RNA, and RNA pools created on the basis of our analysis. TP-1 consists of equal amounts of CaCO2 (colon), KM12L4A (colon), and OVCAR3 (ovary) cell lines. TP-2 consists of equal amounts of CaCO2 (colon), KM12L4A (colon), and U118 MG (brain) cell lines. Mean values with 1 SD as the error bars are plotted for the samples that were assayed more than once. CaOV3, HCT-116, KM12L4A, NT2/D1, and SW480 cell lines were assayed in triplicate, and Stratagene and TP-1 pools were hybridized in duplicate.

Similar articles

Cited by

References

    1. Schena M, Shalon D, Davis RW, Brown PO. Quantitative monitoring of gene expression patterns with a complementary DNA microarray. Science. 1995;270:467–470. - PubMed
    1. Schena M, Shalon D, Heller R, Chai A, Brown PO, Davis RW. Parallel human genome analysis: microarray-based expression monitoring of 1000 genes. Proc Natl Acad Sci USA. 1996;93:10614–10619. - PMC - PubMed
    1. Stratagene http://www.stratagene.com
    1. Eisen MB, Brown PO. DNA arrays for analysis of gene expression. Methods Enzymol. 1999;303:179–205. - PubMed
    1. Hegde P, Qi R, Abernaty K, Gay C, Dharap S, Gaspard R, Hughes J, Snesrud E, Lee N, Quackenbush J. A concise guide to cDNA microarray analysis. Biotechniques. 2000;29:552–562. - PubMed

Publication types

MeSH terms

Substances

LinkOut - more resources