Redundant Gene Selection Based On Genetic And Quick-Reduct Algorithms (original) (raw)

A hybrid method of feature selection for microarray gene expression data

Gene expression profiles, which represent the state of a cell at a molecular level, have great potential as a medical diagnosis tool. Compared to the number of genes involved, available training data sets generally have a fairly small sample size in cancer type classification. These training data limitations constitute a challenge to certain classification methodologies. A reliable selection method for genes relevant for sample classification is needed in order to speed up the processing rate, decrease the predictive error rate, and to avoid incomprehensibility due to the large number of genes investigated. In this study, we combined information gain and an improved binary particle swarm optimization as a hybrid method to implement feature selection, and the K-nearest neighbor (K-NN) method serves as an evaluator for gene expression data classification problems. Experimental results show that this method effectively simplifies feature selection and reduces the total number of featur...

A Two-Stage Feature Selection Method for Gene Expression Data

Omics A Journal of Integrative Biology, 2009

Microarray data referencing gene expression profiles provide valuable answers to a variety of problems, and contributes to advances in clinical medicine. Gene expression data typically has a high dimension and a small sample size. Generally, only relatively small numbers of gene expression data are strongly correlated with a certain phenotype. To analyze gene expression profiles correctly, feature (gene) selection is crucial for classification. Feature (gene) selection has certain advantages, such as effective extraction of genes that influence classification accuracy, elimination of irrelevant genes, and improvement of the classification accuracy calculation. In this paper, we propose a two-stage feature selection method, which uses information gain to implement a gene-ranking process, and combines an improved particle swarm optimization with the K-nearest neighbor method and support vector machine classifiers to calculate the classification accuracy. The experimental results show that the proposed method can effectively select relevant gene subsets, and achieves higher classification accuracy than previous studies.

A Novel Feature Selection for Gene Expression Data

Proceedings of the 9th Joint Conference on Information Sciences (JCIS), 2006

The feature selection process can be considered a problem of global combinatorial optimization in machine learning, which reduces the number of features, removes irrelevant, noisy and redundant data, and results in an acceptable classification accuracy. Therefore, a good feature selection method based on the number of features investigated for sample classification is needed in order to speed up the processing rate, predictive accuracy, and to avoid incomprehensibility. In this paper, particle swarm optimization (PSO) is used to implement a feature selection, and the K-nearest neighbor (K-NN) method with leave-one-out cross-validation (LOOCV) serves as an evaluator of PSO. The support vector machines (SVMs) with the one-versus-rest method serve as a classifier for the classification problem. Experimental results show that our method simplifies features effectively and obtains a higher classification accuracy compared to the other classification methods from the literature.

The selection of relevant and non-redundant features to improve classification performance of microarray gene expression data

Feature selection in the field of tissue classification based on microarray gene expression data primarily focuses on the individual selection of genes. Ignoring relationships between genes allows for the selection of genes that are highly redundant with respect to each other. In this paper we present two feature selection methods that strive towards the selection of a relevant and non-redundant feature subset. Experiments conducted on an artificial dataset as well as on microarray gene expression datasets have indicated that emphasizing the discriminatory power of the entire group instead of the discriminatory power of individual features, generally leads to a better performance using a smaller feature subset.

A Hybrid BPSO-CGA Approach for Gene Selection and Classification of Microarray Data

Journal of Computational Biology, 2012

Microarray analysis promises to detect variations in gene expressions, and changes in the transcription rates of an entire genome in vivo. Microarray gene expression profiles indicate the relative abundance of mRNA corresponding to the genes. The selection of relevant genes from microarray data poses a formidable challenge to researchers due to the high-dimensionality of features, multiclass categories being involved, and the usually small sample size. A classification process is often employed which decreases the dimensionality of the microarray data. In order to correctly analyze microarray data, the goal is to find an optimal subset of features (genes) which adequately represents the original set of features. A hybrid method of binary particle swarm optimization (BPSO) and a combat genetic algorithm (CGA) is to perform the microarray data selection. The K-nearest neighbor (K-NN) method with leave-one-out cross-validation (LOOCV) served as a classifier. The proposed BPSO-CGA approach is compared to ten microarray data sets from the literature. The experimental results indicate that the proposed method not only effectively reduce the number of genes expression level, but also achieves a low classification error rate.

A hybrid feature selection method for DNA microarray data

Computers in Biology and Medicine, 2011

Gene expression profiles, which represent the state of a cell at a molecular level, have great potential as a medical diagnosis tool. In cancer classification, available training data sets are generally of a fairly small sample size compared to the number of genes involved. Along with training data limitations, this constitutes a challenge to certain classification methods. Feature (gene) selection can be used to successfully extract those genes that directly influence classification accuracy and to eliminate genes which have no influence on it. This significantly improves calculation performance and classification accuracy. In this paper, correlation-based feature selection (CFS) and the Taguchi-genetic algorithm (TGA) method were combined into a hybrid method, and the K-nearest neighbor (KNN) with the leaveone-out cross-validation (LOOCV) method served as a classifier for eleven classification profiles to calculate the classification accuracy. Experimental results show that the proposed method reduced redundant features effectively and achieved superior classification accuracy. The classification accuracy obtained by the proposed method was higher in ten out of the eleven gene expression data set test problems when compared to other classification methods from the literature.

A novel feature selection method to improve classification of gene expression data

2004

This paper introduces a novel method for minimum number of gene (feature) selection for a classification problem based on gene expression data with an objective function to maximise the classification accuracy. The method uses a hybrid of Pearson correlation coefficient (PCC) and signal-to-noise ratio (SNR) methods combined with an evolving classification function (ECF). First, the correlation coefficients between genes in a set of thousands, is calculated. Genes, that are highly correlated across samples are considered either dependent or coregulated and form a group (a cluster). Signal-to-noise ratio (SNR) method is applied to rank the correlated genes in this group according to their discriminative power towards the classes. Genes with the highest SNR are used in a preliminary feature set as representatives of each group.

Microarray gene-expression data classification using less gene expressions by combining feature selection methods and classifiers

−Microarray Data, often characterised by high-dimensions and small samples, is used for cancer classification problems that classify the given (tissue) samples as deceased or healthy on the basis of analysis of gene expression profile. The goal of feature selection is to search the most relevant features from thousands of related features of a particular problem domain. The focus of this study is a method that relaxes the maximum accuracy criterion for feature selection and selects the combination of feature selection method and classifier that using small subset of features obtains accuracy not statistically indicatively different than the maximum accuracy. By selecting the classifier employing small number of features along with a good accuracy, the risk of over fitting (bias) is reduced. This has been corroborated empirically using some common attribute selection methods (ReliefF, SVM-RFE, FCBF, and Gain Ratio) and classifiers (3 Nearest Neighbour, Naive Bayes and SVM) applied to 6 different microarray cancer data sets. We use hypothesis testing to compare several configurations and select particular configurations that perform well with small genes on these data sets.

Review On Feature Selection Approaches Using Gene Expression Data

Feature selection has become elementary tool for processing high dimensional data. DNA microarray technology is used for the study of large number of genes simultaneously, which helps in determining the expression levels of the genes. Gene selection using high dimensional gene expression data is foremost and imperative for prediction and classification of disease. This gene expression data can be shown in the form of matrix and usually contains irrelevant, redundant and noisy data, so the study and analysis of data becomes very problematic. The prime purpose of feature selection approaches is to remove the curse of dimensionality, improve the performance and accuracy of classification and clustering algorithms by the elimination of these irrelevant features and reduction of noise. This paper explains the taxonomy of feature selection methods stating their respective pros and cons. It also presents a review on few feature selection approaches, mainly those that have been proposed over the past few years.

A Hybrid Both Filter and Wrapper Feature Selection Method for Microarray Classification

arXiv (Cornell University), 2016

Gene expression data is widely used in disease analysis and cancer diagnosis. However, since gene expression data could contain thousands of genes simultaneously, successful microarray classification is rather difficult. Feature selection is an important pre-treatment for any classification process. Selecting a useful gene subset as a classifier not only decreases the computational time and cost, but also increases classification accuracy. In this study, we applied the information gain method as a filter approach, and an improved binary particle swarm optimization as a wrapper approach to implement feature selection; selected gene subsets were used to evaluate the performance of classification. Experimental results show that by employing the proposed method fewer gene subsets needed to be selected and better classification accuracy could be obtained.