A model for gene selection and classification of gene expression data (original) (raw)

Gene expression data classification using genetic algorithm-based feature selection

TURKISH JOURNAL OF ELECTRICAL ENGINEERING & COMPUTER SCIENCES, 2021

In this study, hybrid methods are proposed for feature selection and classification of gene expression datasets. In the proposed genetic algorithm/support vector machine (GA-SVM) and genetic algorithm/k nearest neighbor (GA-KNN) hybrid methods, genetic algorithm is improved using Pearson's correlation coefficient, Relief-F, or mutual information. Crossover and selection operations of the genetic algorithm are specialized. Eight different gene expression datasets are used for classification process. The classification performances of the proposed methods are compared with the traditional GA-KNN and GA-SVM wrapper methods and other studies in the literature. Classification results demonstrate that higher accuracy rates are obtained with the proposed methods compared to the other methods for all datasets.

Gene selection for sample classification based on gene expression data: study of sensitivity to choice of parameters of the GA/KNN method

Bioinformatics, 2001

Motivation: We recently introduced a multivariate approach that selects a subset of predictive genes jointly for sample classification based on expression data. We tested the algorithm on colon and leukemia data sets. As an extension to our earlier work, we systematically examine the sensitivity, reproducibility and stability of gene selection/sample classification to the choice of parameters of the algorithm. Methods: Our approach combines a Genetic Algorithm (GA) and the k-Nearest Neighbor (KNN) method to identify genes that can jointly discriminate between different classes of samples (e.g. normal versus tumor). The GA/KNN method is a stochastic supervised pattern recognition method. The genes identified are subsequently used to classify independent test set samples. Results: The GA/KNN method is capable of selecting a subset of predictive genes from a large noisy data set for sample classification. It is a multivariate approach that can capture the correlated structure in the data. We find that for a given data set gene selection is highly repeatable in independent runs using the GA/KNN method. In general, however, gene selection may be less robust than classification.

A hybrid approach for gene selection and classification using support vector machine

Int. Arab J. Inf. Technol., 2015

Deoxyribo Nucleic Acid (DNA) microarray technology allows us to generate thousands of gene expression in a single chip. Analyzing gene expression data plays vital role in understanding diseases and discovering medicines. Classification of cancer based on gene expression data is a promising research area in the field of bioinformatics and data mining. All genes do not contribute for efficient classification of samples. Hence, a robust feature selection method is required to identify the relevant genes which help in the classification of samples effectively. Most of the existing feature selection methods are computationally expensive. Redundancy in gene expression data leads to poor classification accuracy and also acts bad on multi class classification. This paper proposes an ensemble feature selection technique which is a combination of Recursive Feature Elimination (RFE) and Based Bayes error Filter (BBF) for gene selection and Support Vector Machine (SVM) algorithm for classificat...

A fuzzy intelligent approach to the classification problem in gene expression data analysis

Knowledge-Based Systems, 2012

Classification is an important data mining task that widely used in several different real world applications. In microarray analysis, classification techniques are applied in order to discriminate diseases or to predict outcomes based on gene expression patterns, and perhaps even to identify the best treatment for given genetic signature. The most important challenge in gene expression data analysis lies in how to deal with its unique ''high dimension small sample'' characteristic, which makes many traditional classification techniques non-applicable or inefficient; and hence, more dedicated techniques are nowadays needed in order to approach this problem. Fuzzy logic is recently shown that is a powerful and suitable soft computing tool for handling the complex problems under incomplete data conditions. In this paper, a new hybrid model is proposed that combines artificial intelligence with fuzzy in order to benefit from unique advantages of both fuzzy logic and the classification power of the artificial neural networks (ANNs), to construct an efficient and accurate hybrid classifier in less available data situations. The proposed model, because of using the fuzzy parameters instead of the crisp parameters, will need less data set in comparing with traditional nonfuzzy neural networks in its training process or with same training sample can better learn and hence can yield more accurate results than traditional neural networks. In addition of theoretical evidence of using fuzzy logic, empirical results of gene expression classification indicate that the proposed model exhibits effectively improved classification accuracy in comparison with traditional artificial neural networks (ANNs) and also some other well-known statistical and intelligent classification models such as the linear discriminant analysis (LDA), the quadratic discriminant analysis (QDA), the K-nearest neighbor (KNN), and the support vector machines (SVMs). Therefore, the proposed model can be applied as an appropriate alternate approach for solving problems with scant data such as gene expression data classification, specifically when higher classification accuracy is needed.

Class prediction based on gene expression: Applying neural networks via a genetic algorithm wrapper

2001

This project focuses on applying neural networks to the classification of biological state based on gene expression data. In order to take advantage of the non-linear classification abilities of neural networks, a genetic algorithm is employed as a "wrapper" feature selector. Results indicate that the genetic algorithm effectively identifies features that allow successful neural network training. In addition, it is shown that ensembles created by combining neural networks from multiple runs of the genetic algorithm consistently outperform single networks.

A HYBRID OF GENETIC ALGORITHM AND SUPPORT VECTOR MACHINE FOR FEATURES SELECTION AND CLASSIFICATION OF GENE EXPRESSION MICROARRAY

Constantly improving gene expression technology offer the ability to measure the expression levels of thousand of genes in parallel. Gene expression data is expected to significantly aid in the development of efficient cancer diagnosis and classification platforms. Key issue that needs to be addressed is the selection of small number of genes that contribute to a disease from the thousands of genes measured on microarrays that are inherently noisy. This work deals with finding the small subset of informative genes from gene expression microarray data which maximize the classification accuracy. This paper introduces a new algorithm of hybrid Genetic Algorithm and Support Vector Machine for genes selection and classification task. We show that the classification accuracy of the proposed algorithm is superior to a number of current state-of-the-art methods of two widely used benchmark datasets. The informative genes from the best subset are validated and verified by comparing them with the biological results produced from biologist and computer scientist researches in order to explore the biological plausibility.