Model-based cluster analysis of microarray gene-expression data - PubMed (original) (raw)

Model-based cluster analysis of microarray gene-expression data

Wei Pan et al. Genome Biol. 2002.

Abstract

Background: Microarray technologies are emerging as a promising tool for genomic studies. The challenge now is how to analyze the resulting large amounts of data. Clustering techniques have been widely applied in analyzing microarray gene-expression data. However, normal mixture model-based cluster analysis has not been widely used for such data, although it has a solid probabilistic foundation. Here, we introduce and illustrate its use in detecting differentially expressed genes. In particular, we do not cluster gene-expression patterns but a summary statistic, the t-statistic.

Results: The method is applied to a data set containing expression levels of 1,176 genes of rats with and without pneumococcal middle-ear infection. Three clusters were found, two of which contain more than 95% genes with almost no altered gene-expression levels, whereas the third one has 30 genes with more or less differential gene-expression levels.

Conclusions: Our results indicate that model-based clustering of t-statistics (and possibly other summary statistics) can be a useful statistical tool to exploit differential gene expression for microarray data.

PubMed Disclaimer

Figures

Figure 1

Figure 1

Histograms of radioactivity intensity levels for the first experiment, a cDNA microarray analysis of 1,176 genes in middle-ear mucosa of healthy (control) rats. (a) Before log-transformation; (b) after log-transformation.

Figure 2

Figure 2

Comparison of the log-transformed, standardized expression data between experiments. Experiments 1 and 2 were conducted using control rats; experiments 3-6 used infected rats.

Figure 3

Figure 3

Gene-expression profiles of the four clusters found using the method described. Each line represents a single gene. Clusters 2 and 3 (containing over 95% of genes) show little change in gene-expression levels; cluster 1 (30 genes) and cluster 4 (6 genes) do show changes in gene-expression levels.

Figure 4

Figure 4

Posterior probability of a gene being in each cluster as a function of the _t_-statistic y, calculated using Equations (1) and (2). A gene is classified to a cluster if its posterior probability of being in the cluster is the largest.

Similar articles

Cited by

References

    1. Brown P, Botstein D. Exploring the new world of the genome with DNA microarrays. Nat Genet. 1999;21(Suppl):33–37. - PubMed
    1. Lander ES. Array of hope. Nat Genet. 1999;21(Suppl):3–4. - PubMed
    1. Eisen M, Spellman P, Brown P, Botstein D. Cluster analysis and display of genome-wide expression patterns. Proc Natl Acad Sci USA. 1998;95:14863–14868. - PMC - PubMed
    1. Tavazoie S, Hughes JD, Campbell MJ, Cho RJ, Church GM. Systematic determination of genetic network architecture. Nat Genet. 1999;22:281–285. - PubMed
    1. Tamayo P, Slonim D, Mesirov J, Zhu Q, Kitareewan S, Dmitrovsky E, Lander ES, Golub TR. Interpreting patterns of gene expression with self-organizing maps: methods and applications to hematopoietic differentiation. Proc Natl Acad Sci USA. 1999;96:2907–2912. - PMC - PubMed

Publication types

MeSH terms

LinkOut - more resources