Spurious spatial periodicity of co-expression in microarray data due to printing design - PubMed (original) (raw)

Spurious spatial periodicity of co-expression in microarray data due to printing design

Gábor Balázsi et al. Nucleic Acids Res. 2003.

Abstract

Global transcriptome data is increasingly combined with sophisticated mathematical analyses to extract information about the functional state of a cell. Yet the extent to which the results reflect experimental bias at the expense of true biological information remains largely unknown. Here we show that the spatial arrangement of probes on microarrays and the particulars of the printing procedure significantly affect the log-ratio data of mRNA expression levels measured during the Saccharomyces cerevisiae cell cycle. We present a numerical method that filters out these technology-derived contributions from the existing transcriptome data, leading to improved functional predictions. The example presented here underlines the need to routinely search and compensate for inherent experimental bias when analyzing systematically collected, internally consistent biological data sets.

PubMed Disclaimer

Figures

Figure 1

Figure 1

Spatial periodicity of temporal mRNA expression profiles correlates with cDNA probe locations on the microarray chip. (a) Average cross-correlation coefficient of the temporal expression profiles as a function of the inter-gene distance along the chromosomes for the combined data (CD). The average spatial cross-correlation coefficient C(D) for each of the 16 yeast chromosomes (A–P) following α-factor arrest-induced synchronization are shown. The inset displays a portion of C(D) obtained for chromosome B to demonstrate the short period (2 gene) spatial periodicity of gene expression. (b) Average distance of the spotted cDNA probes on the microarray chip as a function of the chromosomal distance D. The inset shows in detail this dependence for the same portion of chromosome B, as in (a). (c) Spatial arrangement of deposited cDNA probe spots on the microarray chip. As an example, a set of 264 consecutive genes (in chromosomal order) is considered. Spots of the same color are printed on the slide by the same print tip. The gradually darker shades indicate simultaneous printing of 24 spots from two consecutive rows on the 96-well plate. The numbers in this table correspond to both the spatial order on the chromosome and the position on the 96-well plates from left to right and from top to bottom. (d) The 2 gene and 24 gene periodicities appear as a consequence of the arrangement of cDNA probes on the microarray chip.

Figure 2

Figure 2

The microarray printing procedure as a source of experimental bias. (a) The location of the printing head with four tips as it transfers samples from 96-well plates onto the microarray slide. The resulting printing pattern defines four groups of spots, labeled 1–4. Each well of a 96-well plate can be labeled according to the print tip that took the sample from it. (b) The average log-ratio of measured expression levels calculated from the individual array data (IAD) is shown for each position within the 96-well plates, for time point 10 (i.e. 70 min after release from α-factor arrest). Note the regularity of the pattern within this 8 × 12 matrix. It can be approximately constructed from the repetition of 2 × 2 matrices (corresponding to print tips 1–4, shown in the bottom left corner). Averaging over all wells labeled 1–4 results in the 2 × 2 matrix shown under the 8 × 12 matrix. (c) All samples printed on the microarray yield four spatially distinct groups of 44 × 44 spots, corresponding to print tips 1–4. Averaging the log-ratios of measured expression levels from the IAD within each of the four groups of spots results in the 2 × 2 matrix shown below the microarray. The numbers that appear within the 2 × 2 matrices on the left and right are identical, indicating that printing was performed with a four-tip print head, and each tip contributed a significant bias to the measured expression data.

Figure 3

Figure 3

Print tip-related bias across all experiments and the corresponding simulation. (a) Average log-ratios of expression levels of the features printed by each of the four tips within each 96-well plate used in each experiment (IAD). Blue, red, black and magenta correspond to print tips 1–4, respectively (the numbers define the spatial position within the print head and not the actual print tip). The abrupt changes of the average log-ratios between experiments are likely to correspond to cleaning and interchanging the print heads. Notice how the bias gradually changes within each experiment, until the tips are cleaned or changed. (b) The corresponding simulation: four groups of 10 × 10 uncorrelated Gaussian random numbers were generated in 18 in silico experiments. Additionally, four independent random numbers were added to each of the four groups within each experiment. The log-ratios of simulated expression levels of all spots within each ‘experiment’ are shown. To correct for the tip-related bias, the mean log-ratio for tips 1–4 within each experiment is subtracted from the corresponding group of spots. (c) The result of the correction: average cross-correlation calculated as in Figure 1, but using all genes instead of those residing on the same chromosome. The blue, gray and black lines correspond to the original data, the first degree and the second degree correction, respectively. In the second degree correction, a linear trend of the log-ratios is subtracted within each experiment instead of simply subtracting the mean log-ratio. Notice that the 2 gene and 24 gene periodicities nearly disappear, but a 176 gene periodicity is revealed. (d) Correction of the computationally generated data. The red, blue and black lines correspond to the original, bias-affected and corrected data, respectively. Notice that the correction algorithm almost completely recovers the original in silico data after the correction.

Figure 4

Figure 4

The consequence of the correction on subsequent analyses. Average linkage clustering of the (a) original and (b) corrected data for chromosome A in the α-factor experiment (CD). The colors correspond to functional classes downloaded from the MIPS database (

http://mips.gsf.de/proj/yeast/catalogues/funcat/

). Notice the visible change in the resulting dendrogram and the closer clustering of genes within the same functional classes. (c) The average minimum distance among genes within the same functional class for the original (black bars) and corrected (red bars) data. The minimum distances averaged over all functional classes are 13.3369 and 12.4067 for the original and corrected data, respectively.

References

    1. Brown P.O. and Botstein,D. (1999) Exploring the new world of the genome with DNA microarrays. Nature Genet., 21, 33–37. - PubMed
    1. Hughes T.R., Marton,M.J., Jones,A.R., Roberts,C.J., Stoughton,R., Armour,C.D., Bennett,H.A., Coffey,E., Dai,H., He,Y.D. et al. (2000) Functional discovery via a compendium of expression profiles. Cell, 102, 109–126. - PubMed
    1. Wu L.F., Hughes,T.R., Davierwala,A.P., Robinson,M.D., Stoughton,R. and Altschuler,S.J. (2002) Large-scale prediction of Saccharomyces cerevisiae gene function using overlapping transcriptional clusters. Nature Genet., 31, 255–265. - PubMed
    1. Alizadeh A.A., Eisen,M.B., Davis,R.E., Ma,C., Lossos,I.S., Rosenwald,A., Boldrick,J.C., Sabet,H., Tran,T., Yu,X. et al. (2000) Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling. Nature, 403, 503–511. - PubMed
    1. Bittner M., Meltzer,P., Chen,Y., Jiang,Y., Seftor,E., Hendrix,M., Radmacher,M., Simon,R., Yakhini,Z., Ben-Dor,A. et al. (2000) Molecular classification of cutaneous malignant melanoma by gene expression profiling. Nature, 406, 536–540. - PubMed

MeSH terms

Substances

LinkOut - more resources