Observer-Biased Analysis of Gene Expression Profiles (original) (raw)

Gene Expression Data Analysis using Fuzzy C-means Clustering Technique

International Journal of Computer Applications, 2016

The challenging issue in microarray technique is to analyze and interpret the large volume of data. This can be achieved by clustering techniques in data mining. In hard clustering like hierarchical and k-means clustering techniques, data is divided into distinct clusters, where each data element belongs to exactly one cluster so that the outcome of the clustering may not be correct in many times. The problems addressed in hard clustering could be solved in fuzzy clustering technique. Among fuzzy based clustering, fuzzy c means (FCM) is the most suitable for microarray gene expression data. The problem associated with fuzzy c-means is the number of clusters to be generated for the given dataset needs to be specified in prior. The main objective of this proposed Possibilistic fuzzy c-means method is to determine the precise number of clusters and interpret the same efficiently. The PFCM is a good clustering algorithm to perform classification tests because it possesses capabilities to give more importance to topicalities or membership values. PFCM is a hybridization of PCM and FCM that often avoids various problems of PCM, FCM and FPCM. Based on the sample dataset 'lung' the entire research has been developed. The available research works already developed in this area are

A comparison of fuzzy clustering approaches for quantification of microarray gene expression

Journal of Signal …, 2008

Despite the widespread application of microarray imaging for biomedical imaging research, barriers still exist regarding its reliability for clinical use. A critical major problem lies in accurate spot segmentation and the quantification of gene expression level (mRNA) from the microarray images. A variety of commercial and research freeware packages are available, but most cannot handle array spots with complex shapes such as donuts and scratches. Clustering approaches such as k-means and mixture models were introduced to overcome this difficulty, which use the hard labeling of each pixel. In this paper, we apply fuzzy clustering approaches for spot segmentation, which provides soft labeling of the pixel. We compare several fuzzy clustering approaches for microarray analysis and provide a comprehensive study of these approaches for spot segmentation. We show that possiblistic c-means clustering (PCM) provides the best performance in terms of stability criterion when testing on both a variety of simulated and real microarray images. In addition, we compared three statistical criteria in measuring gene expression levels and show that a new asymptotically unbiased statistic is able to quantify the gene expression level more accurately.

An Analysis of Gene Expression Data using Penalized Fuzzy C-Means Approach

2013

With the rapid advances of microarray technologies, large amounts of high-dimensional gene expression data are being generated, which poses significant computational challenges. A first step towards addressing this challenge is the use of clustering techniques, which is essential in the data mining process to reveal natural structures and identify interesting patterns in the underlying data. A robust gene expression clustering approach to minimize undesirable clustering is proposed. In this paper, Penalized Fuzzy C-Means (PFCM) Clustering algorithm is described and compared with the most representative off-line clustering techniques: K-Means Clustering, Rough K-Means Clustering and Fuzzy C-Means clustering. These techniques are implemented and tested for a Brain Tumor gene expression Dataset. Analysis of the performance of the proposed approach is presented through qualitative validation experiments. From experimental results, it can be observed that Penalized Fuzzy C-Means algorithm shows a much higher usability than the other projected clustering algorithms used in our comparison study. Significant and promising clustering results are presented using Brain Tumor Gene expression dataset. Thus patterns seen in genome-wide expression experiments can be interpreted as indications of the status of cellular processes. In these clustering results, we find that Penalized Fuzzy C-Means algorithm provides useful information as an aid to diagnosis in oncology.

A Computational Approach to Gene Expression Data Extraction and Analysis

The Journal of VLSI Signal Processing-Systems for Signal, Image, and Video Technology, 2000

The rapid advancement of DNA microarray technology has revolutionalized genetic research in bioscience. Due to the enormous amount of gene expression data generated by such technology, computer processing and analysis of such data has become indispensable. In this paper, we present a computational framework for the extraction, analysis and visualization of gene expression data from microarray experiments. A novel, fully automated, spot segmentation algorithm for DNA microarray images, which makes use of adaptive thresholding, morphological processing and statistical intensity modeling, is proposed to: (i) segment the blocks of spots, (ii) generate the grid structure, and (iii) to segment the spot within each subregion. For data analysis, we propose a binary hierarchical clustering (BHC) framework for the clustering of gene expression data. The BHC algorithm involves two major steps. Firstly, the fuzzy C-means algorithm and the average linkage hierarchical clustering algorithm are used to split the data into two classes. Secondly, the Fisher linear discriminant analysis is applied to the two classes to assess whether the split is acceptable. The BHC algorithm is applied to the sub-classes recursively and ends when all clusters cannot be split any further. BHC does not require the number of clusters to be known in advance. It does not place any assumption about the number of samples in each cluster or the class distribution. The hierarchical framework naturally leads to a tree structure representation for effective visualization of gene expressions.

Inference from Clustering with Application to Gene-Expression Microarrays

Journal of Computational Biology, 2002

There are many algorithms to cluster sample data points based on nearness or a similarity measure. Often the implication is that points in different clusters come from different underlying classes, whereas those in the same cluster come from the same class. Stochastically, the underlying classes represent different random processes. The inference is that clusters represent a partition of the sample points according to which process they belong. This paper discusses a model-based clustering toolbox that evaluates cluster accuracy. Each random process is modeled as its mean plus independent noise, sample points are generated, the points are clustered, and the clustering error is the number of points clustered incorrectly according to the generating random processes. Various clustering algorithms are evaluated based on process variance and the key issue of the rate at which algorithmic performance improves with increasing numbers of experimental replications. The model means can be selected by hand to test the separability of expected types of biological expression patterns. Alternatively, the model can be seeded by real data to test the expected precision of that output or the extent of improvement in precision that replication could provide. In the latter case, a clustering algorithm is used to form clusters, and the model is seeded with the means and variances of these clusters. Other algorithms are then tested relative to the seeding algorithm. Results are averaged over various seeds. Output includes error tables and graphs, confusion matrices, principal-component plots, and validation measures. Five algorithms are studied in detail: K-means, fuzzy C-means, self-organizing maps, hierarchical Euclidean-distance-base d and correlation-based clustering. The toolbox is applied to gene-expression clustering based on cDNA microarrays using real data. Expression pro le graphics are generated and error analysis is displayed within the context of these pro le graphics. A large amount of generated output is available over the web.

A Modified Rough Fuzzy ?Clustering -Classification? Model For Gene ExpressionData

International Journal of Innovative Research in Computer and Communication Engineering, 2014

Microarray technology is one of the important biotechnological means that has made it possible to simultaneously monitor the expression levels of thousands of genes during important biological processes and a cross collections of related samples. An important application of microarray data is to elucidate the patterns hidden in gene expression data for an enhanced understanding of functional genomics. A microarray gene expression data set can be represented by an expression table, where each row corresponds to one particular gene, each column to a sample or time point, and each entry of the matrix is the measured expression level of a particular gene in a sample or time point, respectively. However, the large number of genes and the complexity of biological networks greatly increase the challenges of comprehending and interpreting the resulting mass of data, which often consists of millions of measurements. A first step toward addressing this challenge is the use of clustering techniques, which is essential in pattern recognition process to reveal natural structures. Recent decades, more and more researchers study on gene expression profile analysis which provides a more precise and reliable way for disease diagnosis and treatment when compared with traditional cancer diagnosis approaches based on the morphological appearance of cells.Through this research we mainly aim toStudy and analyse different clustering and classification model regarding gene expression data, Design and develop an efficient method for gene expression data clustering and classification finally Conduct experimental analysis to evaluate the proposed methodology to prove the significance of the method

Consensus Clustering: A Resampling-Based Method for Class Discovery and Visualization of Gene Expression Microarray Data

Machine Learning, 2003

In this paper we present a new methodology of class discovery and clustering validation tailored to the task of analyzing gene expression data. The method can best be thought of as an analysis approach, to guide and assist in the use of any of a wide range of available clustering algorithms. We call the new methodology consensus clustering, and in conjunction with resampling techniques, it provides for a method to represent the consensus across multiple runs of a clustering algorithm and to assess the stability of the discovered clusters. The method can also be used to represent the consensus over multiple runs of a clustering algorithm with random restart (such as K-means, model-based Bayesian clustering, SOM, etc.), so as to account for its sensitivity to the initial conditions. Finally, it provides for a visualization tool to inspect cluster number, membership, and boundaries. We present the results of our experiments on both simulated data and real gene expression data aimed at evaluating the effectiveness of the methodology in discovering biologically meaningful clusters.