clusterMaker: a multi-algorithm clustering plugin for Cytoscape - PubMed (original) (raw)
clusterMaker: a multi-algorithm clustering plugin for Cytoscape
John H Morris et al. BMC Bioinformatics. 2011.
Abstract
Background: In the post-genomic era, the rapid increase in high-throughput data calls for computational tools capable of integrating data of diverse types and facilitating recognition of biologically meaningful patterns within them. For example, protein-protein interaction data sets have been clustered to identify stable complexes, but scientists lack easily accessible tools to facilitate combined analyses of multiple data sets from different types of experiments. Here we present clusterMaker, a Cytoscape plugin that implements several clustering algorithms and provides network, dendrogram, and heat map views of the results. The Cytoscape network is linked to all of the other views, so that a selection in one is immediately reflected in the others. clusterMaker is the first Cytoscape plugin to implement such a wide variety of clustering algorithms and visualizations, including the only implementations of hierarchical clustering, dendrogram plus heat map visualization (tree view), k-means, k-medoid, SCPS, AutoSOME, and native (Java) MCL.
Results: Results are presented in the form of three scenarios of use: analysis of protein expression data using a recently published mouse interactome and a mouse microarray data set of nearly one hundred diverse cell/tissue types; the identification of protein complexes in the yeast Saccharomyces cerevisiae; and the cluster analysis of the vicinal oxygen chelate (VOC) enzyme superfamily. For scenario one, we explore functionally enriched mouse interactomes specific to particular cellular phenotypes and apply fuzzy clustering. For scenario two, we explore the prefoldin complex in detail using both physical and genetic interaction clusters. For scenario three, we explore the possible annotation of a protein as a methylmalonyl-CoA epimerase within the VOC superfamily. Cytoscape session files for all three scenarios are provided in the Additional Files section.
Conclusions: The Cytoscape plugin clusterMaker provides a number of clustering algorithms and visualizations that can be used independently or in combination for analysis and visualization of biological data sets, and for confirming or generating hypotheses about biological function. Several of these visualizations and algorithms are only available to Cytoscape users through the clusterMaker plugin. clusterMaker is available via the Cytoscape plugin manager.
Figures
Figure 1
Screenshots of clusterMaker visualizations. (A) and (C) show the results of hierarchically clustering (by expression data) the yeast protein-protein interaction network included with Cytoscape (galFiltered.cys). (A) TreeView visualization showing the clustering of both nodes and attributes. (B) The symmetrical TreeView of an EMAP showing a selected cluster. (C) Cytoscape screenshot of the network used to produce (A). The group hierarchy is shown. The groups (and nodes that are part of those groups) are selected as a subtree in the TreeView. (D) The new network resulting from an MCL clustering of the TAP-MS data from Collins, et al. [52]. The option to restore inter-cluster edges after the automatic layout was selected.
Figure 2
Gene expression clustering reveals mouse protein interactome modules and fuzzy relationships among mouse cells and tissues. Heat maps showing clusters of mouse gene expression data (GSE10246) identified using (A) hierarchical clustering and (B) AutoSOME clustering. (C) Protein interactome [45] divided into subnetworks corresponding to co-expression clusters identified by AutoSOME. (D) Fuzzy cluster network of cell/tissue types in GSE10246. Nodes represent individual cell/tissue types (labeled with first word of each sample name only), node colors correspond to different clusters, and increasing edge thickness and opacity reflect increasing frequency of co-clustering between any given pair of nodes over all ensemble iterations (see [34]). (E) Expression data of four cell/tissue types from GSE10246 superimposed onto the ten largest subnetworks from panel C (Stomach = GSM258771; Lymph Node = GSM258691; Cerebral Cortex = GSM258635; Embryonic Stem Cell = GSM258658). All expression data are log2 scaled and median centered. In panel B, all clusters are ordered by decreasing cluster size, and the yellow-cyan color scale is identical to panel A. In panels A and B, all arrays (cell/tissue types) are horizontally ordered the same as the GSE10246 data set.
Figure 3
Clustering of yeast protein-protein interaction networks in the context of overlapping yeast genetic interaction data reveals possible pathway interactions between three well-known complexes. (A) The overall results of MCL clustering of the Collins et al., [52] data set showing the largest clusters. Nodes are colored according to cluster. Thick edges represent intra-cluster edges and thin edges are inter-cluster. Three complexes are highlighted: SWR1, SET1, and prefoldin. (B) Closeup of the prefoldin complex from the chromosome biology EMAP (Additional File 3). Note that there is a very strong positive genetic interaction (yellow) between all of the genes in the complex except for GIM3 and GIM4, which is still positive overall. (C) Closeup of the prefoldin complex from the RNA processing EMAP (Additional File 4). The closeup shows the same slightly decreased interaction for GIM3 and GIM4. (D) The section of the chromosome biology EMAP with the prefoldin complex showing the strong negative interaction with SWR1 and positive interaction with SET1.
Figure 4
Protein similarity network clustering indicates possible family membership for uncharacterized proteins. (A) A distribution of edge weights (binned -log(BLAST E-values)) of the VOC superfamily is shown, with a cutoff value of 5.5 indicated by a red vertical line. The cutoff was determined by a heuristic described in [53] and was used for subsequent clustering. (B) MCL clusters for the VOC superfamily are displayed with nodes colored by family assignment. Red nodes represent proteins with unknown function. (See Additional File 6 for TransClust Clusters). (C) Four clusters within the MCL clustering results show only proteins from a single family or proteins of unknown function. (Three of these four clusters also appear in the TransClust results.) Based on this analysis, we hypothesize that the function of the unknowns is the same as that of the other proteins in each cluster. The protein highlighted in blue is BH2212, which was randomly selected for further analysis.
Similar articles
- clusterMaker2: a major update to clusterMaker, a multi-algorithm clustering app for Cytoscape.
Utriainen M, Morris JH. Utriainen M, et al. BMC Bioinformatics. 2023 Apr 5;24(1):134. doi: 10.1186/s12859-023-05225-z. BMC Bioinformatics. 2023. PMID: 37020209 Free PMC article. - CytoCluster: A Cytoscape Plugin for Cluster Analysis and Visualization of Biological Networks.
Li M, Li D, Tang Y, Wu F, Wang J. Li M, et al. Int J Mol Sci. 2017 Aug 31;18(9):1880. doi: 10.3390/ijms18091880. Int J Mol Sci. 2017. PMID: 28858211 Free PMC article. - SCPS: a fast implementation of a spectral method for detecting protein families on a genome-wide scale.
Nepusz T, Sasidharan R, Paccanaro A. Nepusz T, et al. BMC Bioinformatics. 2010 Mar 9;11:120. doi: 10.1186/1471-2105-11-120. BMC Bioinformatics. 2010. PMID: 20214776 Free PMC article. - NOA: a cytoscape plugin for network ontology analysis.
Zhang C, Wang J, Hanspers K, Xu D, Chen L, Pico AR. Zhang C, et al. Bioinformatics. 2013 Aug 15;29(16):2066-7. doi: 10.1093/bioinformatics/btt334. Epub 2013 Jun 7. Bioinformatics. 2013. PMID: 23749961 Free PMC article. - Analyzing protein-protein interaction networks.
Koh GC, Porras P, Aranda B, Hermjakob H, Orchard SE. Koh GC, et al. J Proteome Res. 2012 Apr 6;11(4):2014-31. doi: 10.1021/pr201211w. Epub 2012 Mar 2. J Proteome Res. 2012. PMID: 22385417 Review.
Cited by
- A catalog of tens of thousands of viruses from human metagenomes reveals hidden associations with chronic diseases.
Tisza MJ, Buck CB. Tisza MJ, et al. Proc Natl Acad Sci U S A. 2021 Jun 8;118(23):e2023202118. doi: 10.1073/pnas.2023202118. Epub 2021 Jun 3. Proc Natl Acad Sci U S A. 2021. PMID: 34083435 Free PMC article. - Cellular and extracellular proteomic profiling of paradoxical low-flow low-gradient aortic stenosis myocardium.
Elkenani M, Barallobre-Barreiro J, Schnelle M, Mohamed BA, Beuthner BE, Jacob CF, Paul NB, Yin X, Theofilatos K, Fischer A, Puls M, Zeisberg EM, Shah AM, Mayr M, Hasenfuß G, Toischer K. Elkenani M, et al. Front Cardiovasc Med. 2024 Sep 16;11:1398114. doi: 10.3389/fcvm.2024.1398114. eCollection 2024. Front Cardiovasc Med. 2024. PMID: 39355352 Free PMC article. - Cardiac retinoic acid levels decline in heart failure.
Yang N, Parker LE, Yu J, Jones JW, Liu T, Papanicolaou KN, Talbot CC Jr, Margulies KB, O'Rourke B, Kane MA, Foster DB. Yang N, et al. JCI Insight. 2021 Apr 22;6(8):e137593. doi: 10.1172/jci.insight.137593. JCI Insight. 2021. PMID: 33724958 Free PMC article. - Impacts of Anthropogenic Pollutants on Benthic Prokaryotic Communities in Mediterranean Touristic Ports.
Tamburini E, Doni L, Lussu R, Meloni F, Cappai G, Carucci A, Casalone E, Mastromei G, Vitali F. Tamburini E, et al. Front Microbiol. 2020 Jun 9;11:1234. doi: 10.3389/fmicb.2020.01234. eCollection 2020. Front Microbiol. 2020. PMID: 32655521 Free PMC article. - A Molecular Interaction Map of Klebsiella pneumoniae and Its Human Host Reveals Potential Mechanisms of Host Cell Subversion.
Saha D, Kundu S. Saha D, et al. Front Microbiol. 2021 Feb 18;12:613067. doi: 10.3389/fmicb.2021.613067. eCollection 2021. Front Microbiol. 2021. PMID: 33679637 Free PMC article.
References
- Schuldiner M, Collins SR, Thompson NJ, Denic V, Bhamidipati A, Punna T, Ihmels J, Andrews B, Boone C, Greenblatt JF. et al.Exploration of the function and organization of the yeast early secretory pathway through an epistatic miniarray profile. Cell. 2005;123(3):507–519. doi: 10.1016/j.cell.2005.08.031. - DOI - PubMed
Publication types
MeSH terms
Substances
LinkOut - more resources
Full Text Sources
Other Literature Sources
Molecular Biology Databases