COMPUTATIONAL ANALYSIS ON GENE EXPRESSION PATTERN: A SURVEY. (original) (raw)
Related papers
A Study on Computational Process in Gene Expression Data
This context is commenced to examine the various methods and its challenges in Disease Identification of Gene Expression Data.The elementalresponsibility of these techniques is classification and categorization of gene expression, analysis of the expression, Pattern Recognition, and Identification. This provides an inclusive survey of Micro Array Data analysis techniques and intends a processing component for disease identification. For thehealthcare provider, it is essential to maintain the quality of data because this data is useful to provide cost effective healthcare treatments to the patients. Health Care Administration retains the Microarray data which is refined by expertise and is analyzed by the expertise to identify the disease. This process of analyzing this Microarray data as manual is complicated in identification and classification; due to this Microarray data some difficulties such as missing information, empty values, and incorrect entries. Exclusive of quality information there is no valuableconsequences. For successful data mining, animpediment in health data is individual the majordifficulty for examining medical data. So, it is essential to maintain the quality and accuracy data for data mining to making aneffective decision. The major goal of this survey is focused on various techniques of data mining for developing a prediction model for disease susceptibility using Gene Expression Data.The microarray data is pre-processed to analyze the gene expression to classify the over-expression and under-expression data. Then the classified gene data is then clustered and the best feature selection is applied to discover a pattern. Finally, the association mining handled under the organized set of the gene expression data to theidentification of the disease. This context provides efficient techniques to overcome the manual identification of diseases. 1. Introduction DNA microarrays propose the capability to appear at the expression of thousands of genes in a particularresearch one of the significantrelevance of microarray knowledge is disease identification and classification. Throughmicroarraytechnology, researchers will be proficient in organizing special diseases according todissimilar expression intensity incommon anddevelopment cells, to determine the affiliationamong genes, to recognize the critical genes in the development of disease [1]. The main task of microarray classification is to construct a classifier from chronological microarray gene expression data, and then it utilizes the classifier to categorizeprospectapproachingdata. Appropriate to the rapid improvementofDNA microarray knowledge, gene rangetechniques andorganizationtechniques are being figured for enhanceduse of classification algorithm in microarray gene expression data.The study of outsized gene expression data sets is fetching a dispute in disease classification [2]. Thusgene selection is one of the significantcharacteristics. Proficient gene selection can considerablysimplicity computational burden of the consequent classification assignment and can yield a much smaller and more condensed gene set,not including thedefeat of classification. In classifying microarray data, the key objective of gene selection is to explore for the genes, which remain the greatest amount of information about the set and decrease the categorization error. [3] Data mining techniques classically descend into either supervised or unsupervised classes.Microarray technologies afford a dominant tool by which the expression prototypes of thousands of genes can be examinedconcurrently whose relevancecollection from disease diagnosis to treatment response. Gene expression is the renovation of the DNA progression into mRNA progression by dictation then transformed into amino acid sequences called proteins. The key challenge in classifying gene expression data is the annoyance of dimensionality difficulty. There is ahugeamount of genes (features) evaluated to small sample sizes [3]. To conquer this, feature selection is worn to recognize differentially articulated genes and to eliminateinappropriate genes. Gene selection remains asa significant task to extend the exactness and speed ofclassification structures.In general, feature selection can be prepared into three kinds: Filter, Wrapper, and Embedded methods. They are classified based on how afeature selection methodmerges with the production of aclassification form. Anextensivequantity of literature has been available on gene selection techniques for constructionvaluable classification model. In this paper, we
Survey on different Methods for Classifying Gene Expression using Microarray Approach
International Journal of Computer Applications, 2016
The recognizing and pre-detecting process of genetic mutation becomes an important issue for research. There are various techniques that may help in detecting diseases, cancer and tumors. Microarray is considered as type of representation for gene expression that may help in detection process. M microarrays are considered as a representation for samples that contains gene expression. These gene expressions are used in analyzing samples that may be normal or affected, and help in diagnosis. To utilize the benefit of microarrays, machine learning algorithms and gene selection methods must be used to facilitate processing on microarrays and to overcome some challenges that may face microarrays. From these challenges that may face Microarrays; high dimensional data problem which is considered an important challenges that rise in different datasets. It suffers from redundant, irrelevant and noisy data. Solving this
A novel methodology for finding the regulation on gene expression data
Progress in Natural Science, 2009
DNA microarray technology is a high throughput and parallel technique for genomic investigation due to its advantages of simultaneously surveying features of large scales complex data in biology. This paper aims to find feature subset to build the classifier for gene expression data analysis. At first, K-means clustering algorithm was carried out on the dataset of yeast cell cycle. Based on Rand calculation, a statistical method was used to pick out the data points (genes) for classifier design. Meanwhile, the principal component analysis was applied to help to construct the classifier. For the validation of classifier built and prediction of a target subset of genes, discriminant analysis in terms of partial least square regression and artificial neural network were also performed.
Gene Expression Profiling of DNA Microarray Data using various Data Mining Methodologies
2017
This paper aims to mine the gene expression profiling of DNA microarray data using various Data Mining methodologies with the biological vital sequence and to visualize the numerous data processing methodologies like classification, clump and association rule mining. DNA microarray technology has been extremely used in the field of bioinformatics for exploring genomic organization. It enables to analyze expression of many genes in a single reaction. The techniques currently employed to do analysis of microarray expression data is clustering and classification. In this paper, the cancer gene expression is analyzed using hierarchical clusterin g that identifies a group of genes sharing similar expression profiles and dendrograms are employed that provides an efficient means of prediction over the expression. Knowledge discretization is completed by clump the sequence in to two clusters i.e up-regulated and down-regulated.
Neural Networks for …, 2002
This manuscript describes a combined approach of unsupervised clustering followed by supervised learning that provides an efficient classification of conditions in DNA array gene expression experiments (different cell lines including some cancer types, in the cases shown). Firstly the dimensionality of the dataset of gene expression profiles is reduced to a number of non-redundant clusters of co-expressing genes using an unsupervised clustering algorithm, the Self Organizing Tree Algorithm (SOTA), a hierarchical version of Self Organizing Maps (SOM). Then, the average values of these clusters are used for the training of a perceptron that produces a very efficient classification of the conditions. This way of reducing the dimensionality of the data set seems to perform better than other ones previously proposed such as PCA. In addition, the weights that connect the gene clusters to the different experimental conditions can be used to assess the relative importance of the genes in the definition of these classes. Finally, Gene Ontology (GO) terms are used to infer a possible biological role for these groups of genes and to asses the validity of the classification from a biological point of view.
Cancer Identification and Gene Classification using DNA Microarray Gene Expression Patterns
2011
DNA microarray gene expression patterns of several model organisms provide a fascinating opportunity to explore important abnormal biological phenomena. The development of cancer is a multi-step process in which several genes and other environmental and hormonal factors play an important role. In this paper, a new algorithm is proposed to analysis DNA microarray gene expression patterns efficiently for huge amount of DNA microarray data. For better visibility and understanding, experimental results of DNA microarray gene pattern analysis are represented graphically. The shape of each graph corresponding to a DNA microarray gene expression pattern is determined by using an eight-directional chain code sequence, which is invariant to translation, scaling, and rotation. The cancer development is identified based on the variations of DNA microarray gene expression patterns of the same organism by simultaneously monitoring the expression of thousand of genes. At the end, classification o...
Pattern Recognition Methods for Gene Expression Analysis
2008
Computer pattern recognition is a very important field of the human knowledge with applications in complex and diversified areas. Particularly in bioinformatics, pattern recognition is applied to address a miriad of problems such as identification of important regions in DNA sequences, biological classification (diagnostic) of a tissue based on mRNA expression of thousands of genes, functional identification of proteins through tridimensional structure analysis and identification of interaction networks between genes or proteins. This chapter focuses on the application of pattern recognition techniques in gene expression analysis, where the data are obtained from microarray technology. The images obtained from this technique must be preprocessed with image analysis methods which also will be discussed here. Gene expression analysis involves a huge number of genes (thousands) and small experiments or samples (a few tens). Because of this, dimensionality reduction techniques, of which feature selection methods are focused, are essential in this context to give a reduced subset of genes responsible for some biological phenomenon (e.g a tissue tumor or the activation/inhibition of genes, proteins or metabolic pathways). Two feature selection applications are highlighted. The first tries to discover subsets of genes that serve as markers of a biological phenomenon. The second problem involves identification of gene networks regulation.
Classification of genes based on gene expression analysis
Physics of Atomic Nuclei, 2008
Systems biology and bioinformatics are now major fields for productive research. DNA microarrays and other array technologies, and genome sequencing have advanced to the point that it is now possible to monitor gene expression on a genomic scale. Gene expression analysis is discussed and some important clustering techniques are considered. The patterns identified in the data suggest similarities in the gene behavior which provides useful information for the gene functionalities. We discuss measures for investigating the homogeneity of gene expression data in order to optimize the clustering process. We contribute to the knowledge of functional roles and regulation of E. coli genes by proposing a classification of these genes based on consistently correlated genes in expression data and similarities of gene expression patterns. A new visualization tool for targeted projection pursuit and dimensionality reduction of gene expression data is demonstrated.
Smart Innovation, Systems and Technologies, 2015
Machine learning is a burgeoning technology used for extractions of knowledge from an ocean of data. It has robust binding with optimization and artificial intelligence that delivers theory, methodologies and application domain to the field of statistics and computer science. Machine learning tasks are broadly classified into two groups namely supervised learning and unsupervised learning. The analysis of the unsupervised data requires thorough computational activities using different clustering algorithms. Microarray gene expression data are taken into consideration for cluster regulating genes from non-regulating genes. In our work optimization technique (Cat Swarm Optimization) is used to minimize the number of cluster by evaluating the Euclidean distance among the centroids. A comparative study is being carried out by clustering the regulating genes before optimization and after optimization. In our work Principal component analysis (PCA) is incorporated for dimensionality reduction of vast dataset to ensure qualitative cluster analysis.