Modeling of DNA microarray data by using physical properties of hybridization - PubMed (original) (raw)

Modeling of DNA microarray data by using physical properties of hybridization

G A Held et al. Proc Natl Acad Sci U S A. 2003.

Abstract

A method of analyzing DNA microarray data based on the physical modeling of hybridization is presented. We demonstrate, in experimental data, a correlation between observed hybridization intensity and calculated free energy of hybridization. Then, combining hybridization rate equations, calculated free energies of hybridization, and microarray data for known target concentrations, we construct an algorithm to compute transcript concentration levels from microarray data. We also develop a method for eliminating outlying data points identified by our algorithm. We test the efficacy of these methods by comparing our results with an existing statistical algorithm, as well as by performing a cross-validation test on our model.

PubMed Disclaimer

Figures

Fig. 1.

Fig. 1.

Observed hybridization intensity as a function of calculated hybridization free energy, –Δ_G_, for spike-in target concentrations between 0 and 1,024 pM. Each data point shows the average measured intensity for all probes with a calculated free energy that lies within a given energy bin at a particular target concentration. The centers of the 10 energy bins range between 26 and 38 kcal/mol, and their widths are all 1.33 kcal/mol. The average intensities are plotted at the values of the bin centers. An increase in observed hybridization intensity with increasing Δ_G_ is clearly observable. The lines simply connect the data points.

Fig. 2.

Fig. 2.

Observed hybridization intensity as a function of spike-in target concentration of PM probe one of gene 37777_at. Replicate measurements are shown as distinct data points. The solid line is a best fit of the data to Eq. 2, with _n_I, n e, and bg e adjustable parameters.

Fig. 3.

Fig. 3.

Observed hybridization intensity as a function of spike-in target concentration for the energy-binned data shown in Fig. 1. Units of energy are kcal/mol. Solid lines show the best fits of all the plotted data fit simultaneously to Eq. 2 (see Data Analysis).

Fig. 4.

Fig. 4.

Best-fit values of n e (a) and bg e (b) obtained from the fits shown in Fig. 3 vs. calculated Δ_G_. Solid lines are best fits to the plotted results (see Data Analysis).

Fig. 5.

Fig. 5.

Observed hybridization intensity as a function of calculated Δ_G_ for the probe set of gene 36311_at. The data were collected on a single GeneChip with a spike-in target concentration of 256 pM. The dotted line is a best fitof all the data to Eq. 2, with concentration c the only adjustable parameter; the best fit c is 92 pM. The solid line is a best fit to only the filled data points, the hollow ones having been identified as statistical outliers (see Data Analysis); the best fit c in this case is 151 pM, closer to the known spike-in value of 256 pM.

Fig. 6.

Fig. 6.

Median value of predicted concentrations of all transcript measurements taken at a given spike-in target concentration vs. that concentration. Results for the analysis discussed in the text and for Affymetrix (

microarray suite

  1. are plotted as black and red squares, respectively. The solid lines are best fits of the plotted results to power laws (see Analysis of the Accuracy of the Algorithm).

Fig. 7.

Fig. 7.

Range of predicted concentrations vs. spike-in target concentration. The error bars at each spike-in concentration show the range of predicted concentration values that must be encompassed to include half of the predicted values (see Analysis of the Accuracy of the Algorithm). The black and red error bars show ranges obtained by using nalysis described in the text and with Affymetrix (

microarray suite

5), respectively.

Fig. 8.

Fig. 8.

Histograms of concentration values predicted by our algorithm (Upper) and with Affymetrix (

microarray suite

5)(Lower) for all data at spike-in concentrations of 16 pM (Left) and 256 pM (Right).

Fig. 9.

Fig. 9.

Range of predicted concentrations obtained through the cross-validation procedure (see Analysis of the Accuracy of the Algorithm) vs. spike-in target concentration. The error bars are defined by the same criteria as in Fig. 7.

Fig. 10.

Fig. 10.

Observed hybridization intensity as a function of probe hybridization free energy for gene 1091_at at spike-in target concentration of 256 pM. Data points from different replicate measurements are shown in different colors.

Similar articles

Cited by

References

    1. Brown, P. O. & Botstein, D. (1999) Nat. Genet. 21, 33–37. - PubMed
    1. Lipshutz, R. J., Fodor, S. P. A., Gingeras, T. R. & Lockhart, D. J. (1999) Nat. Genet. 21, 20–24. - PubMed
    1. Lockhart, D. J., Dong, H. L., Byrne, M. C., Follettie, M. T., Gallo, M. V., Chee, M. S., Mittmann, M., Wang, C. W., Kobayashi, M., Horton, H. & Brown, E. L. (1996) Nat. Biotechnol. 14, 1675–1680. - PubMed
    1. Affymetrix (2001) Statistical Algorithms Reference Guide, Affymetrix Technical Note (Affymetrix, Santa Clara, CA).
    1. Mount, D. W. (2001) Bioinformatics: Sequence and Genome Analysis (Cold Spring Harbor Lab. Press, Plainview, NY), pp. 519–523.

MeSH terms

LinkOut - more resources