Bi-k-bi clustering: mining large scale gene expression data using two-level biclustering (original) (raw)

Biclustering of High-throughput Gene Expression Data with Bicluster Miner

ata Mining Workshops (ICDMW), 2012 IEEE 12th International Conference on; pp. 131-138, 2012

During recent years, many biclustering algorithms have been developed for the analysis of gene expression data to complement and expand the capabilities of traditional clustering methods. With biclustering, genes with similar expression profiles can be identified not only over the whole data set but also across subsets of experimental conditions allowing genes to simultaneously belong to several expression patterns. This property makes biclustering a powerful approach especially when it is applied to data with large number of conditions. In spite of the clear theoretical benefit, the full potential of biclustering has not been realized within the gene expression research community and thus it has never truly become a part of the standard gene expression data analysis. Possible reasons include for example the unrealization of the various complementary ways in which biclustering can be applied to micro array or next-generation sequencing based gene expression data sets and the lack of reliable and fast algorithms. In this paper, we first illustrate the various opportunities of applying biclustering within a typical gene expression data analysis pipeline. Then a new biclustering method (BiclusterMiner) is presented that can be applied to all presented cases. The developed method is the first discrete biclustering algorithm that is able to simultaneously handle both up- and down-regulated genes by taking the direction of regulation into account and still discover all possible maximal biclusters. The efficiency of the proposed algorithm is demonstrated on real and synthetic datasets.

Biclustering of Gene Expression Data using a Two - Phase Method

International Journal of Computer Applications, 2014

Biclustering is a very useful data mining technique which identifies coherent patterns from microarray gene expression data. A bicluster of a gene expression dataset is a subset of genes which exhibit similar expression patterns along a subset of conditions. Biclustering is a powerful analytical tool for the biologist and has generated considerable interest over the past few decades. Many biclustering algorithms optimize a mean squared residue to discover biclusters from a gene expression dataset. In this paper a Two-Phase method of finding a bicluster is developed. In the first phase, a modified version of k-means algorithm is applied to the gene expression data to generate k clusters. In the second phase, an iterative search is performed to check the possibility of removing more genes and conditions within the given threshold value of mean squared residue score. Experimental results on yeast dataset show that our approach can effectively find high quality biclusters

A systematic comparison and evaluation of biclustering methods for gene expression data

Bioinformatics, 2006

Motivation: In recent years, there have been various efforts to overcome the limitations of standard clustering approaches for the analysis of gene expression data by grouping genes and samples simultaneously. The underlying concept, which is often referred to as biclustering, allows to identify sets of genes sharing compatible expression patterns across subsets of samples, and its usefulness has been demonstrated for different organisms and datasets. Several biclustering methods have been proposed in the literature; however, it is not clear how the different techniques compare with each other with respect to the biological relevance of the clusters as well as with other characteristics such as robustness and sensitivity to noise. Accordingly, no guidelines concerning the choice of the biclustering method are currently available. Results: First, this paper provides a methodology for comparing and validating biclustering methods that includes a simple binary reference model. Although this model captures the essential features of most biclustering approaches, it is still simple enough to exactly determine all optimal groupings; to this end, we propose a fast divide-and-conquer algorithm (Bimax). Second, we evaluate the performance of five salient biclustering algorithms together with the reference model and a hierarchical clustering method on various synthetic and real datasets for Saccharomyces cerevisiae and Arabidopsis thaliana. The comparison reveals that (1) biclustering in general has advantages over a conventional hierarchical clustering approach, (2) there are considerable performance differences between the tested methods and (3) already the simple reference model delivers relevant patterns within all considered settings. Availability: The datasets used, the outcomes of the biclustering algorithms and the Bimax implementation for the reference model are available at

IMPROVED BICLUSTERING ALGORITHM FOR GENE EXPRESSION DATA

Biclustering algorithms simultaneously cluster both rows and columns. These types of algorithms are applied to gene expression data analysis to find a subset of genes that exhibit similar expression pattern under a subset of conditions. Cheng and Church introduced the mean squared residue measure to capture the coherence of a subset of genes over a subset of conditions. They provided a set of heuristic algorithms based primarily on node deletion to find one bicluster or a set of biclusters after masking discovered biclusters with random values. The mean squared residue is a popular measure of bicluster quality. One drawback however is that it is biased toward flat biclusters with low row variance. In this paper, we introduce an improved bicluster score that removes this bias and promotes the discovery the most significant biclusters in the dataset. We employ this score within a new biclustering approach based on the bottom up search strategy. We believe that the bottom-up search approach better models the underlying functional modules of the gene expression dataset.