A cascading support vector machines system for gene expression data classification (original) (raw)

Support Vector Machine Classification of Microarray Gene Expression Data

Proceedings of the …, 2000

We introduce a new method of functionally classifying genes using gene expression data from DNA microarray hybridization experiments. The method is based on the theory of sup-port vector machines (SVMs). We describe SVMs that use different similarity metrics includ-ing a ...

A HYBRID OF GENETIC ALGORITHM AND SUPPORT VECTOR MACHINE FOR FEATURES SELECTION AND CLASSIFICATION OF GENE EXPRESSION MICROARRAY

Constantly improving gene expression technology offer the ability to measure the expression levels of thousand of genes in parallel. Gene expression data is expected to significantly aid in the development of efficient cancer diagnosis and classification platforms. Key issue that needs to be addressed is the selection of small number of genes that contribute to a disease from the thousands of genes measured on microarrays that are inherently noisy. This work deals with finding the small subset of informative genes from gene expression microarray data which maximize the classification accuracy. This paper introduces a new algorithm of hybrid Genetic Algorithm and Support Vector Machine for genes selection and classification task. We show that the classification accuracy of the proposed algorithm is superior to a number of current state-of-the-art methods of two widely used benchmark datasets. The informative genes from the best subset are validated and verified by comparing them with the biological results produced from biologist and computer scientist researches in order to explore the biological plausibility.

Classification Techniques in Gene Expression Microarray Data

IJCSMC, 2018

Cancer nowadays is a common and heterogeneous disease affecting all people of all ages. Gene expression data can serve to understand cancer or other types of disease well. Building classification system using gene expression dataset that can properly classify new samples is a challenging task due to the nature of gene expression data that is usually composed of dozens of samples characterized by thousands of genes. This paper put a light on different classification methods used in classifying gene expression data including SVM, NB, C4.5 and some of the state-of-the-art techniques.

Classification of Microarray Gene Expression Data Using a New Binary Support Vector System

2005 International Conference on Neural Networks and Brain, 2005

Classification of yeast genes based on their expression levels obtained from micro array hybridization experiments is an important and challenging application domain in data mining and knowledge discovery. Over the past decade, neural networks and support vector machines (SVMs) have achieved good results for genes classification. This paper presents a methodology which uses two neural networks to classify unseen genes based on their expression levels. In order to remove some of the noise and deal with the imbalanced class distribution of the dataset, data pre-processing is firstly performed before data classification in which data cleaning, data transformation and data over-sampling using SMOTE algorithm are undertaken. Thereafter, two neural networks with different architectures are trained using Scaled Conjugate Gradient in two different ways: 1) the training-validation-testing approach and 2) 10-fold crossvalidation. Experimental results show that this methodology outperforms the previous best-performing SVM for this problem and 8 other classifiers: 3 SVMs, C4.5, Bayesian network, Naive Bayes, K-NN and JRip.

Knowledge-based Analysis of Microarray Gene Expression Data By Using Support Vector Machines

Proceedings of The National Academy of Sciences, 2000

We introduce a method of functionally classifying genes by using gene expression data from DNA microarray hybridization experiments. The method is based on the theory of support vector machines (SVMs). SVMs are considered a supervised computer learning method because they exploit prior knowledge of gene function to identify unknown genes of similar function from expression data. SVMs avoid several problems associated with unsupervised clustering methods, such as hierarchical clustering and self-organizing maps. SVMs have many mathematical features that make them attractive for gene expression analysis, including their flexibility in choosing a similarity function, sparseness of solution when dealing with large data sets, the ability to handle large feature spaces, and the ability to identify outliers. We test several SVMs that use different similarity metrics, as well as some other supervised learning methods, and find that the SVMs best identify sets of genes with a common function using expression data. Finally, we use SVMs to predict functional roles for uncharacterized yeast ORFs based on their expression data.

A comparative study of different machine learning methods on microarray gene expression data

2008

Background: Several classification and feature selection methods have been studied for the identification of differentially expressed genes in microarray data. Classification methods such as SVM, RBF Neural Nets, MLP Neural Nets, Bayesian, Decision Tree and Random Forrest methods have been used in recent studies. The accuracy of these methods has been calculated with validation methods such as v-fold validation. However there is lack of comparison between these methods to find a better framework for classification, clustering and analysis of microarray gene expression results. Results: In this study, we compared the efficiency of the classification methods including; SVM, RBF Neural Nets, MLP Neural Nets, Bayesian, Decision Tree and Random Forrest methods. The v-fold cross validation was used to calculate the accuracy of the classifiers. Some of the common clustering methods including K-means, DBC, and EM clustering were applied to the datasets and the efficiency of these methods have been analysed. Further the efficiency of the feature selection methods including support vector machine recursive feature elimination (SVM-RFE), Chi Squared, and CSF were compared. In each case these methods were applied to eight different binary (two class) microarray datasets. We evaluated the class prediction efficiency of each gene list in training and test cross-validation using supervised classifiers. Conclusions: We presented a study in which we compared some of the common used classification, clustering, and feature selection methods. We applied these methods to eight publicly available datasets, and compared how these methods performed in class prediction of test datasets. We reported that the choice of feature selection methods, the number of genes in the gene list, the number of cases (samples) substantially influence classification success. Based on features chosen by these methods, error rates and accuracy of several classification algorithms were obtained. Results revealed the importance of feature selection in accurately classifying new samples and how an integrated feature selection and classification algorithm is performing and is capable of identifying significant genes.

Classification of Microarray Gene Expression Data

Classification of yeast genes based on their expression levels obtained from micro array hybridization experiments is an important and challenging application domain in data mining and knowledge discovery. Over the past decade, neural networks and support vector machines (SVMs) have achieved good results for genes classification. This paper presents a methodology which uses two neural networks to classify unseen genes based on their expression levels. In order to remove some of the noise and deal with the imbalanced class distribution of the dataset, data pre-processing is firstly performed before data classification in which data cleaning, data transformation and data over-sampling using SMOTE algorithm are undertaken. Thereafter, two neural networks with different architectures are trained using Scaled Conjugate Gradient in two different ways: 1) the training-validation-testing approach and 2) 10-fold crossvalidation. Experimental results show that this methodology outperforms the previous best-performing SVM for this problem and 8 other classifiers: 3 SVMs, C4.5, Bayesian network, Naive Bayes, K-NN and JRip.

Gene Expression Data Classification using Support Vector Machine and Mutual Information-based Gene Selection

DNA microarray technology can monitor the expression levels of thousands of genes simultaneously during important biological processes and across collections of related samples. Knowledge gained through microarray data analysis is increasingly important as they are useful for phenotype classification of diseases. This paper presents an effective method for gene classification using Support Vector Machine (SVM). SVM is a supervised learning algorithm capable of solving complex classification problems. Mutual information (MI) between the genes and the class label is used for identifying the informative genes. The selected genes are utilized for training the SVM classifier and the testing ability is evaluated using Leave-one-Out Cross Validation (LOOCV) method. The performance of the proposed approach is evaluated using two cancer microarray datasets. From the simulation study it is observed that the proposed approach reduces the dimension of the input features by identifying the most informative gene subset and improve classification accuracy when compared to other approaches.

A Review of Cancer Classification Software for Gene Expression Data

International Journal of Bio-Science and Bio-Technology, 2015

Microarray technology provides a way for researchers to measure the expression level of thousands of genes simultaneously in a single experiment. Due to the increasing amount of microarray data, the field of microarray data analysis has become a major topic among researchers. One of the examples of microarray data analysis is classification. Classification is the process of determining the classes for samples. The goal of classification is to identify the differentially expressed genes so that these genes can be used to predict the classes for new samples. In order to perform the tasks of classification of microarray data, classification software is required for effective classification and analysis of large-scale data. This paper reviews numerous classification software applications for gene expression data. In this paper, the reviewed software can be categorized into six supervised classification methods: Support

CASCADING SVMS AS A TOOL FOR MEDICAL DIAGNOSIS USING MULTI-CLASS GENE EXPRESSION DATA

International Journal on Artificial Intelligence Tools, 2006

In this paper we propose a novel Support Vector Machines-based architecture for medical diagnosis using multi-class gene expression data. It consists of a pre-processing unit and N − 1 sequentially ordered blocks capable of classifying N classes in a cascading manner. Each block embodies both a gene selection and a classification module. It offers the flexibility of constructing block-specific gene expression spaces and hypersurfaces for the discrimination of the different classes. The proposed architecture was applied for medical diagnostic tasks including prostate and lung cancer diagnosis. Its performance was evaluated by using a leave-one-out cross validation approach which avoids the bias introduced by the gene selection process. The results show that it provides high accuracy which in most cases exceeds the accuracy achieved by the popular one-vs-one and onevs-all SVM combination schemes and Nearest-Neighbor classifiers. The cascading SVMs can be successfully applied as a medical diagnostic tool.