A graph-based representation of Gene Expression profiles in DNA microarrays (original) (raw)

Differential gene expression graphs: A data structure for classification in DNA Microarrays

8th IEEE International Conference on BioInformatics and BioEngineering, BIBE 2008, 2008

This paper proposes an innovative data structure to be used as a backbone in designing microarray phenotype sample classifiers. The data structure is based on graphs and it is built from a differential analysis of the expression levels of healthy and diseased tissue samples in a microarray dataset. The proposed data structure is built in such a way that, by construction, it shows a number of properties that are perfectly suited to address several problems like feature extraction, clustering, and classification.

A cDNA Microarray Gene Expression Data Classifier for Clinical Diagnostics Based on Graph Theory

IEEE/ACM Transactions on Computational Biology and Bioinformatics, 2011

Despite great advances in discovering cancer molecular profiles, the proper application of microarray technology to routine clinical diagnostics is still a challenge. Current practices in the classification of microarrays' data show two main limitations: the reliability of the training data sets used to build the classifiers, and the classifiers' performances, especially when the sample to be classified does not belong to any of the available classes. In this case, state-of-the-art algorithms usually produce a high rate of false positives that, in real diagnostic applications, are unacceptable. To address this problem, this paper presents a new cDNA microarray data classification algorithm based on graph theory and is able to overcome most of the limitations of known classification methodologies. The classifier works by analyzing gene expression data organized in an innovative data structure based on graphs, where vertices correspond to genes and edges to gene expression relationships. To demonstrate the novelty of the proposed approach, the authors present an experimental performance comparison between the proposed classifier and several state-of-the-art classification algorithms.

Classification of microarray data using gene networks

BMC …, 2007

Background: Microarrays have become extremely useful for analysing genetic phenomena, but establishing a relation between microarray analysis results (typically a list of genes) and their biological significance is often difficult. Currently, the standard approach is to map a posteriori the results onto gene networks in order to elucidate the functions perturbed at the level of pathways. However, integrating a priori knowledge of the gene networks could help in the statistical analysis of gene expression data and in their biological interpretation.

Comparative evaluation of microarray-based gene expression databases

2003

Microarrays make it possible to monitor the expression of thousands of genes in parallel thus generating huge amounts of data. So far, several databases have been developed for managing and analyzing this kind of data but the current state of the art in this field is still early stage. In this paper, we comprehensively analyze the requirements for microarray data management. We consider the various kinds of data involved as well as data preparation, integration and analysis needs. The identified requirements are then used to comparatively evaluate eight existing microarray databases described in the literature. In addition to providing an overview of the current state of the art we identify problems that should be addressed in the future to obtain better solutions for managing and analyzing microarray data.

Classification Techniques in Gene Expression Microarray Data

IJCSMC, 2018

Cancer nowadays is a common and heterogeneous disease affecting all people of all ages. Gene expression data can serve to understand cancer or other types of disease well. Building classification system using gene expression dataset that can properly classify new samples is a challenging task due to the nature of gene expression data that is usually composed of dozens of samples characterized by thousands of genes. This paper put a light on different classification methods used in classifying gene expression data including SVM, NB, C4.5 and some of the state-of-the-art techniques.

Classification of Microarray Gene Expression Data

Classification of yeast genes based on their expression levels obtained from micro array hybridization experiments is an important and challenging application domain in data mining and knowledge discovery. Over the past decade, neural networks and support vector machines (SVMs) have achieved good results for genes classification. This paper presents a methodology which uses two neural networks to classify unseen genes based on their expression levels. In order to remove some of the noise and deal with the imbalanced class distribution of the dataset, data pre-processing is firstly performed before data classification in which data cleaning, data transformation and data over-sampling using SMOTE algorithm are undertaken. Thereafter, two neural networks with different architectures are trained using Scaled Conjugate Gradient in two different ways: 1) the training-validation-testing approach and 2) 10-fold crossvalidation. Experimental results show that this methodology outperforms the previous best-performing SVM for this problem and 8 other classifiers: 3 SVMs, C4.5, Bayesian network, Naive Bayes, K-NN and JRip.

A Study on Computational Process in Gene Expression Data

This context is commenced to examine the various methods and its challenges in Disease Identification of Gene Expression Data.The elementalresponsibility of these techniques is classification and categorization of gene expression, analysis of the expression, Pattern Recognition, and Identification. This provides an inclusive survey of Micro Array Data analysis techniques and intends a processing component for disease identification. For thehealthcare provider, it is essential to maintain the quality of data because this data is useful to provide cost effective healthcare treatments to the patients. Health Care Administration retains the Microarray data which is refined by expertise and is analyzed by the expertise to identify the disease. This process of analyzing this Microarray data as manual is complicated in identification and classification; due to this Microarray data some difficulties such as missing information, empty values, and incorrect entries. Exclusive of quality information there is no valuableconsequences. For successful data mining, animpediment in health data is individual the majordifficulty for examining medical data. So, it is essential to maintain the quality and accuracy data for data mining to making aneffective decision. The major goal of this survey is focused on various techniques of data mining for developing a prediction model for disease susceptibility using Gene Expression Data.The microarray data is pre-processed to analyze the gene expression to classify the over-expression and under-expression data. Then the classified gene data is then clustered and the best feature selection is applied to discover a pattern. Finally, the association mining handled under the organized set of the gene expression data to theidentification of the disease. This context provides efficient techniques to overcome the manual identification of diseases. 1. Introduction DNA microarrays propose the capability to appear at the expression of thousands of genes in a particularresearch one of the significantrelevance of microarray knowledge is disease identification and classification. Throughmicroarraytechnology, researchers will be proficient in organizing special diseases according todissimilar expression intensity incommon anddevelopment cells, to determine the affiliationamong genes, to recognize the critical genes in the development of disease [1]. The main task of microarray classification is to construct a classifier from chronological microarray gene expression data, and then it utilizes the classifier to categorizeprospectapproachingdata. Appropriate to the rapid improvementofDNA microarray knowledge, gene rangetechniques andorganizationtechniques are being figured for enhanceduse of classification algorithm in microarray gene expression data.The study of outsized gene expression data sets is fetching a dispute in disease classification [2]. Thusgene selection is one of the significantcharacteristics. Proficient gene selection can considerablysimplicity computational burden of the consequent classification assignment and can yield a much smaller and more condensed gene set,not including thedefeat of classification. In classifying microarray data, the key objective of gene selection is to explore for the genes, which remain the greatest amount of information about the set and decrease the categorization error. [3] Data mining techniques classically descend into either supervised or unsupervised classes.Microarray technologies afford a dominant tool by which the expression prototypes of thousands of genes can be examinedconcurrently whose relevancecollection from disease diagnosis to treatment response. Gene expression is the renovation of the DNA progression into mRNA progression by dictation then transformed into amino acid sequences called proteins. The key challenge in classifying gene expression data is the annoyance of dimensionality difficulty. There is ahugeamount of genes (features) evaluated to small sample sizes [3]. To conquer this, feature selection is worn to recognize differentially articulated genes and to eliminateinappropriate genes. Gene selection remains asa significant task to extend the exactness and speed ofclassification structures.In general, feature selection can be prepared into three kinds: Filter, Wrapper, and Embedded methods. They are classified based on how afeature selection methodmerges with the production of aclassification form. Anextensivequantity of literature has been available on gene selection techniques for constructionvaluable classification model. In this paper, we

Analysis of Microarray Gene Expression Data

Current Bioinformatics, 2006

This article reviews the methods utilized in processing and analysis of gene expression data generated using DNA microarrays. This type of experiment allows to determine relative levels of mRNA abundance in a set of tissues or cell populations for thousands of genes simultaneously. Naturally, such an experiment requires computational and statistical analysis techniques. At the outset of the processing pipeline, the computational procedures are largely determined by the technology and experimental setup that are used. Subsequently, as more reliable intensity values for genes emerge, pattern discovery methods come into play. The most striking peculiarity of this kind of data is that one usually obtains measurements for thousands of genes for only a much smaller number of conditions. This is at the root of several of the statistical questions discussed here.

Gene expression databases and data mining

…, 2003

The DNA microarray technology has arguably caught the attention of the worldwide life science community and is now systematically supporting major discoveries in many fields of study. The majority of the initial technical challenges of conducting experiments are being resolved, only to be replaced with new informatics hurdles, including statistical analysis, data visualization, interpretation, and storage. Two systems of databases, one containing expression data and one containing annotation data are quickly becoming essential knowledge repositories of the research community. This present paper surveys several databases, which are considered "pillars" of research and important nodes in the network. This paper focuses on a generalized workflow scheme typical for microarray experiments using two examples related to cancer research. The workflow is used to reference appropriate databases and tools for each step in the process of array experimentation. Additionally, benefits and drawbacks of current array databases are addressed, and suggestions are made for their improvement.