Evaluation of microarray preprocessing algorithms based on concordance with RT-PCR in clinical samples - PubMed (original) (raw)

Evaluation of microarray preprocessing algorithms based on concordance with RT-PCR in clinical samples

Balazs Gyorffy et al. PLoS One. 2009.

Abstract

Background: Several preprocessing algorithms for Affymetrix gene expression microarrays have been developed, and their performance on spike-in data sets has been evaluated previously. However, a comprehensive comparison of preprocessing algorithms on samples taken under research conditions has not been performed.

Methodology/principal findings: We used TaqMan RT-PCR arrays as a reference to evaluate the accuracy of expression values from Affymetrix microarrays in two experimental data sets: one comprising 84 genes in 36 colon biopsies, and the other comprising 75 genes in 29 cancer cell lines. We evaluated consistency using the Pearson correlation between measurements obtained on the two platforms. Also, we introduce the log-ratio discrepancy as a more relevant measure of discordance between gene expression platforms. Of nine preprocessing algorithms tested, PLIER+16 produced expression values that were most consistent with RT-PCR measurements, although the difference in performance between most of the algorithms was not statistically significant.

Conclusions/significance: Our results support the choice of PLIER+16 for the preprocessing of clinical Affymetrix microarray data. However, other algorithms performed similarly and are probably also good choices.

PubMed Disclaimer

Conflict of interest statement

Competing Interests: The authors have declared that no competing interests exist.

Figures

Figure 1

Figure 1. The colon and cell line data sets are representative of clinical microarray data.

For several Affymetrix data sets, box-and-whiskers plots indicate the distribution of three bias metrics: a) RNA degradation slope, b) median perfect-match probe intensity, and c) fraction of probe sets called present. A narrower distribution indicates greater consistency in technical conditions. LatinSquare133 and LatinSquare95 are spike-in data sets produced by the microarray manufacturer ; Gyorffy_cells and Gyorffy_colon are the data sets analyzed in this paper , ; the other five are publicly-available clinical data sets –.

Figure 2

Figure 2. Pearson correlation coefficients between microarray and RT-PCR.

The distribution of Pearson correlation coefficients for each microarray preprocessing algorithm is indicated by a box plot, for a) the colon cancer data set (84 genes, 36 samples), and b) the cell line data set (75 genes, 29 samples). The box indicates the 25th to 75th percentile, and the heavier line indicates the median. Algorithms are displayed in decreasing order of the median, such that the more accurate algorithms are at the top. The colorgrams on the right-hand side indicate P values (Wilcoxon test) comparing each pair of algorithms.

Figure 3

Figure 3. Log-ratio discrepancy between microarray and RT-PCR.

The distribution of the log-ratio discrepancy for each microarray preprocessing algorithm is indicated by a box plot, for a) the colon cancer data set, and b) the cell line data set. Algorithms are displayed in order of the median, such that the more accurate algorithms are at the top. The colorgrams on the right-hand side indicate P values (Wilcoxon test) comparing each pair of algorithms.

Similar articles

Cited by

References

    1. Affymetrix. 2008 Latin Square Data [ http://www.affymetrix.com/support/technical/sample_data/datasets.affx]
    1. Choe SE, Boutros M, Michelson AM, Church GM, Halfon MS. Preferred analysis methods for Affymetrix GeneChips revealed by a wholly defined control dataset. Genome Biol. 2005;6:R16. - PMC - PubMed
    1. Shi L, Reid LH, Jones WD, Shippy R, Warrington JA, et al. The MicroArray Quality Control (MAQC) project shows inter- and intraplatform reproducibility of gene expression measurements. Nat Biotechnol. 2006;24:1151–1161. - PMC - PubMed
    1. Cope LM, Irizarry RA, Jaffee HA, Wu ZJ, Speed TP. A benchmark for affymetrix GeneChip expression measures. Bioinformatics. 2004;20:323–331. - PubMed
    1. Tong W, Lucas AB, Shippy R, Fan X, Fang H, et al. Evaluation of external RNA controls for the assessment of microarray performance. Nat Biotechnol. 2006;24:1132–1139. - PubMed

Publication types

MeSH terms

LinkOut - more resources