Identification and handling of artifactual gene expression profiles emerging in microarray hybridization experiments - PubMed (original) (raw)

Comparative Study

Identification and handling of artifactual gene expression profiles emerging in microarray hybridization experiments

Leonid Brodsky et al. Nucleic Acids Res. 2004.

Abstract

Mathematical methods of analysis of microarray hybridizations deal with gene expression profiles as elementary units. However, some of these profiles do not reflect a biologically relevant transcriptional response, but rather stem from technical artifacts. Here, we describe two technically independent but rationally interconnected methods for identification of such artifactual profiles. Our diagnostics are based on detection of deviations from uniformity, which is assumed as the main underlying principle of microarray design. Method 1 is based on detection of non-uniformity of microarray distribution of printed genes that are clustered based on the similarity of their expression profiles. Method 2 is based on evaluation of the presence of gene-specific microarray spots within the slides' areas characterized by an abnormal concentration of low/high differential expression values, which we define as 'patterns of differentials'. Applying two novel algorithms, for nested clustering (method 1) and for pattern detection (method 2), we can make a dual estimation of the profile's quality for almost every printed gene. Genes with artifactual profiles detected by method 1 may then be removed from further analysis. Suspicious differential expression values detected by method 2 may be either removed or weighted according to the probabilities of patterns that cover them, thus diminishing their input in any further data analysis.

PubMed Disclaimer

Figures

Figure 1

Figure 1

Nested clustering of gene expression profiles. (A) Two-dimensional representation of the nested clustering procedure. The gene expression profiles are shown as separate points in a two-dimensional space. Higher level cluster 1 (Cl-1) contains two compact lower level clusters, Cl-1a and Cl-1b. Higher level cluster 2 (Cl-2) contains two poorly separated lower level clusters, Cl-2a and Cl-2b. (B) Actual expression profiles of genes included in Cl-1 and 2. The _x_-axis shows the hybridization experiments and the _y_-axis shows the ln of values of differential expression. See text for details.

Figure 2

Figure 2

Automatic identification of patterns using a city-block distance. (A) An example of the distribution of differential expression values over a slide. Yellow, blue and pink spots have differential expression values belonging to high value (>2), low value (<0.5) or intermediate value intervals, respectively. Semi-transparent rhombi represent some of the optimal city-block neighborhoods of spots with high values of differentials. The interconnected union of these rhombi constitutes a ‘pattern zone’. (B) Detected pattern of high differentials; the spots with high differential expression values that were covered by a detected ‘pattern zone’.

Figure 3

Figure 3

Uniformity versus non-uniformity of cluster distribution over the microarray. (A and C) Examples of gene clusters (cluster A and cluster B) detected in the same microarray experiment consisting of 15 hybridizations. The probes for hybridizations 1–6 were derived from cells subjected to ‘treatment 1’ in a time-course manner. Hybridizations 7–14 relate to a time-course treatment of the same cells with another agent (‘treatment 2’). Hybridization 15 represents untreated control cells. The _x_-axis shows hybridization experiments and the _y_-axis shows the ln of values of differential expression. (B) Microarray distribution of spots corresponding to the genes included in cluster A. (D) Microarray distribution of spots corresponding to the genes included in cluster B.

Figure 4

Figure 4

Influence of gene sorting according to the KS criterion of biological quality on hierarchical clustering of probes. (A) Hierarchical clustering of probes within the hybridization set according to gene expression profiles of all 10 000 genes printed on the microarray. (B) Hierarchical clustering of probes within the same hybridization set according to expression profiles of 6283 genes with a median KS _P_-value ≥0.2, predicted to have biological expression profiles. (C) Hierarchical clustering of probes within the same hybridization set according to expression profiles of 1847 genes with a median KS _P_-value ≤0.001 predicted to have artifactual expression profiles. Probes: A1, PDGFβ 1 ng/ml; A2, PDGFβ 10 ng/ml; B1, TGFβ 1 ng/ml; B2, TGFβ 10 ng/ml; C, hypoxia (0.5% O2, 5% CO2). For details, see text.

Figure 5

Figure 5

Application of weighted measure of distance for clustering of gene expression profiles and hierarchical clustering of hybridizations. (A) A cluster obtained from the application of the weighted clustering procedure represents a combination of two otherwise separated clusters (shown in blue and red). The _x_-axis shows the hybridization probes and the _y_-axis shows the ln of values of differential expression. (B) Microarray distribution of spots corresponding to the genes within the ‘blue’ cluster. (C) Microarray distribution of spots corresponding to the genes within the ‘red’ cluster. (D) Pattern of low differentials on the slide corresponding to probe 8. (E) Hierarchical clustering of probes within the hybridization set according to gene expression profiles of all 10 000 genes printed on the microarray (same as shown in Fig. 4A). (F) Hierarchical clustering of the same hybridization set using a weighted measure of distance, based on the expression profiles of all 10 000 printed genes. Note the improvement of the probe clustering in accordance with the underlying biological conditions (for details, see the text and the legend to Fig. 4).

Figure 6

Figure 6

Coincidence of microarray distribution of artifactual clusters with the zones of patterns of differential expression values. (A and B) Two gene clusters detected in the same microarray experiment, comprising 15 hybridizations. The _x_-axis shows the hybridization probes and the _y_-axis shows the ln of values of differential expression. (C and D) Microarray distribution of spots corresponding to the genes included in the clusters shown in (A) and (B), respectively. (E and F) Patterns of differentials appearing on the slides corresponding to hybridizations 10 and 2, respectively. The patterns whose position coincides with the microarray distribution of clusters shown in (A) and (B) are colored green. (G and H) Deduced Cy5 microarray images of slides corresponding to hybridizations 10 and 2, respectively.

Similar articles

Cited by

References

    1. Brown P.O. and Botstein,D. (1999) Exploring the new world of the genome with DNA microarrays. Nature Genet., 21, 33–37. - PubMed
    1. Ramaswamy S., Tamayo,P., Rifkin,R., Mukherjee,S., Yeang,C.H., Angelo,M., Ladd,C., Reich,M., Latulippe,E., Mesirov,J.P. et al. (2001) Multiclass cancer diagnosis using tumor gene expression signatures. Proc. Natl Acad. Sci. USA, 98, 15149–15154. - PMC - PubMed
    1. Ressom H., Wang,D. and Natarajan,P. (2003) Adaptive double self-organizing maps for clustering gene expression profiles. Neural Networks, 16, 633–640. - PubMed
    1. Segal E., Shapira,M., Regev,A., Pe’er,D., Botstein,D., Koller,D. and Friedman,N. (2003) Module networks: identifying regulatory modules and their condition-specific regulators from gene expression data. Nature Genet., 34, 166–176. - PubMed
    1. Tu Y., Stolovitzky,G. and Klein,U. (2002) Quantitative noise analysis for gene expression microarray experiments. Proc. Natl Acad. Sci. USA, 99, 14031–14036. - PMC - PubMed

Publication types

MeSH terms

LinkOut - more resources