Bayesian hierarchical model for estimating gene expression intensity using multiple scanned microarrays - PubMed (original) (raw)

Bayesian hierarchical model for estimating gene expression intensity using multiple scanned microarrays

Rashi Gupta et al. EURASIP J Bioinform Syst Biol. 2008.

Abstract

We propose a method for improving the quality of signal from DNA microarrays by using several scans at varying scanner sensitivities. A Bayesian latent intensity model is introduced for the analysis of such data. The method improves the accuracy at which expressions can be measured in all ranges and extends the dynamic range of measured gene expression at the high end. Our method is generic and can be applied to data from any organism, for imaging with any scanner that allows varying the laser power, and for extraction with any image analysis software. Results from a self-self hybridization data set illustrate an improved precision in the estimation of the expression of genes compared to what can be achieved by applying standard methods and using only a single scan.

PubMed Disclaimer

Figures

Figure 1

Figure 1

Plot of multiple scans of array formula image for data from Cy5 channel. The mean spot intensities from scan-3, scan-2, and scan-1 are plotted against scan-3. Saturation at the upper end of the intensities can be seen clearly. Very similar behavior was seen for the data from Cy3 channel.

Figure 2

Figure 2

Posterior distribution of the latent variable formula image for two genes obtained when considering censoring at formula image (i.e., at 10.71 and shown in grey) and when considering censoring at formula image (i.e., at 11 and shown in black bars). The measurements from "scan-1, scan-2, scan-3" are (a) (left-above) 9.15, 8.99, 8.61 and (b) (right-above) 10.84, 10.66, 10.23. (Different representations have been used to enhance visibility.)

Figure 3

Figure 3

Posterior distribution of the latent variable formula image for two genes obtained using all three scans (shown in grey) and using only two scans (scan-1 and scan-2, shown in black bars). The observation from "scan-1, scan-2, scan-3" for the genes are (a) (left-above) 10.51, 9.88, 9.08 and (b) (right-above) 11.01, 10.88, 10.50. (Different representations have been used to enhance the visibility.)

Figure 4

Figure 4

Posterior distribution of true latent intensity for replicated spots on (a) (left-above) the same array A (b) (right-above) different arrays (array formula image and array formula image). The replicated spots formula image and formula image had "scan-1, scan-2, scan-3" measurements as 8.35, 8.26, 8.02 and 8.32, 7.94, 7.76, respectively, on array formula image, and spot formula image had measurements 9.31, 9.23, 8.97 on array formula image and measurements 9.27, 9.17, 8.87 on array formula image. (Different representations have been used to enhance visibility.)

Figure 5

Figure 5

This plot demonstrates the relashionship between the estimates of the latent intensities (on natural scale, for data from Cy3 channel) and the measurements from scan-1 over the range [200, 65 535] for 530 spots. The intensities are sorted in an ascending order according to scan-1 reading.

Figure 6

Figure 6

This plot illustrates the dependence of the estimates of the latent intensities on the scan-2 and scan-3 readings in a situation in which the scan-1 readings were saturated. 120 randomly selected genes with scan-1 measurement close to 65 535 are shown by a (nearly) horizontal line. Corresponding measurements from scan-2 and scan-3 are also plotted. The estimates of the latent intensities (posterior median) corresponding to these 120 spots are shown in dots and connected by dotted line. All measurements are on natural scale.

Figure 7

Figure 7

Estimated residuals (= measured values - corresponding posterior median) from the empirical data plotted against the rank of the estimated gene expression for (a) (left-above) scan-1, (b) (middle-above) scan-2, and (c) (right-above) scan-3.

Figure 8

Figure 8

These plots illustrate the sample variability of the posterior distributions, considering (latent) gene expression intensities and in each case 17 simulated samples. The true expression value (on natural scale) is shown with a vertical bar.

Figure 9

Figure 9

Estimated percentage of bias plotted against the spot numbers, based on a simulation experiment.

Figure 10

Figure 10

Comparison of the histograms of the log fold change corresponding to the scan-1 data (shown in grey) and the estimated posterior median (Cy3, Cy5) (shown in black).

Similar articles

Cited by

References

    1. Schena M, Shalon D, Davis RW, Brown PO. Quantitative monitoring of gene expression patterns with a comple mentary DNA microarray. Science. 1995;270(5235):467–470. doi: 10.1126/science.270.5235.467. - DOI - PubMed
    1. Yang Y, Buckley M, Dudoit S, Speed T. Comparison of methods for image analysis on cDNA microarray data. Journal of Computational and Graphical Statistics. 2001;11(1):108–136.
    1. Dudley AM, Aach J, Steffen MA, Church GM. Measuring absolute expression with microarrays with a calibrated reference sample and an extended signal intensity range. Proceedings of the National Academy of Sciences of the United States of America. 2002;99(11):7554–7559. doi: 10.1073/pnas.112683499. - DOI - PMC - PubMed
    1. Khondoker MR, Glasbey CA, Worton BJ. Statistical estimation of gene expression using multiple laser scans of microarrays. Bioinformatics. 2006;22(2):215–219. doi: 10.1093/bioinformatics/bti790. - DOI - PubMed
    1. Lyng H, Badiee A, Svendsrud DH, Hovig E, Myklebost O, Stokke T. Profound influence of microarray scanner characteristics on gene expression ratios: analysis and procedure for correction. BMC Genomics. 2004;5:10. doi: 10.1186/1471-2164-5-10. - DOI - PMC - PubMed

LinkOut - more resources