A mixture model-based approach to the clustering of microarray expression data - PubMed (original) (raw)
A mixture model-based approach to the clustering of microarray expression data
G J McLachlan et al. Bioinformatics. 2002 Mar.
Abstract
Motivation: This paper introduces the software EMMIX-GENE that has been developed for the specific purpose of a model-based approach to the clustering of microarray expression data, in particular, of tissue samples on a very large number of genes. The latter is a nonstandard problem in parametric cluster analysis because the dimension of the feature space (the number of genes) is typically much greater than the number of tissues. A feasible approach is provided by first selecting a subset of the genes relevant for the clustering of the tissue samples by fitting mixtures of t distributions to rank the genes in order of increasing size of the likelihood ratio statistic for the test of one versus two components in the mixture model. The imposition of a threshold on the likelihood ratio statistic used in conjunction with a threshold on the size of a cluster allows the selection of a relevant set of genes. However, even this reduced set of genes will usually be too large for a normal mixture model to be fitted directly to the tissues, and so the use of mixtures of factor analyzers is exploited to reduce effectively the dimension of the feature space of genes.
Results: The usefulness of the EMMIX-GENE approach for the clustering of tissue samples is demonstrated on two well-known data sets on colon and leukaemia tissues. For both data sets, relevant subsets of the genes are able to be selected that reveal interesting clusterings of the tissues that are either consistent with the external classification of the tissues or with background and biological knowledge of these sets.
Availability: EMMIX-GENE is available at http://www.maths.uq.edu.au/\~gjm/emmix-gene/
Similar articles
- Simultaneous gene clustering and subset selection for sample classification via MDL.
Jörnsten R, Yu B. Jörnsten R, et al. Bioinformatics. 2003 Jun 12;19(9):1100-9. doi: 10.1093/bioinformatics/btg039. Bioinformatics. 2003. PMID: 12801870 - Bayesian automatic relevance determination algorithms for classifying gene expression data.
Li Y, Campbell C, Tipping M. Li Y, et al. Bioinformatics. 2002 Oct;18(10):1332-9. doi: 10.1093/bioinformatics/18.10.1332. Bioinformatics. 2002. PMID: 12376377 - An unsupervised hierarchical dynamic self-organizing approach to cancer class discovery and marker gene identification in microarray data.
Hsu AL, Tang SL, Halgamuge SK. Hsu AL, et al. Bioinformatics. 2003 Nov 1;19(16):2131-40. doi: 10.1093/bioinformatics/btg296. Bioinformatics. 2003. PMID: 14594719 - Microarray data analysis: from hypotheses to conclusions using gene expression data.
Armstrong NJ, van de Wiel MA. Armstrong NJ, et al. Cell Oncol. 2004;26(5-6):279-90. doi: 10.1155/2004/943940. Cell Oncol. 2004. PMID: 15623938 Free PMC article. Review. - Basic microarray analysis: grouping and feature reduction.
Raychaudhuri S, Sutphin PD, Chang JT, Altman RB. Raychaudhuri S, et al. Trends Biotechnol. 2001 May;19(5):189-93. doi: 10.1016/s0167-7799(01)01599-2. Trends Biotechnol. 2001. PMID: 11301132 Review.
Cited by
- Deterministic and stochastic neuronal contributions to distinct synchronous CA3 network bursts.
Takano H, McCartney M, Ortinski PI, Yue C, Putt ME, Coulter DA. Takano H, et al. J Neurosci. 2012 Apr 4;32(14):4743-54. doi: 10.1523/JNEUROSCI.4277-11.2012. J Neurosci. 2012. PMID: 22492030 Free PMC article. - MULTI-K: accurate classification of microarray subtypes using ensemble k-means clustering.
Kim EY, Kim SY, Ashlock D, Nam D. Kim EY, et al. BMC Bioinformatics. 2009 Aug 22;10:260. doi: 10.1186/1471-2105-10-260. BMC Bioinformatics. 2009. PMID: 19698124 Free PMC article. - NeatMap--non-clustering heat map alternatives in R.
Rajaram S, Oono Y. Rajaram S, et al. BMC Bioinformatics. 2010 Jan 22;11:45. doi: 10.1186/1471-2105-11-45. BMC Bioinformatics. 2010. PMID: 20096121 Free PMC article. - Context-specific infinite mixtures for clustering gene expression profiles across diverse microarray dataset.
Liu X, Sivaganesan S, Yeung KY, Guo J, Bumgarner RE, Medvedovic M. Liu X, et al. Bioinformatics. 2006 Jul 15;22(14):1737-44. doi: 10.1093/bioinformatics/btl184. Epub 2006 May 18. Bioinformatics. 2006. PMID: 16709591 Free PMC article. - Cross-species microarray analysis with the OSCAR system suggests an INSR->Pax6->NQO1 neuro-protective pathway in aging and Alzheimer's disease.
Lu Y, He X, Zhong S. Lu Y, et al. Nucleic Acids Res. 2007 Jul;35(Web Server issue):W105-14. doi: 10.1093/nar/gkm408. Epub 2007 Jun 1. Nucleic Acids Res. 2007. PMID: 17545194 Free PMC article.
MeSH terms
LinkOut - more resources
Full Text Sources
Other Literature Sources
Medical