Redundant Gene Selection Based On Genetic And Quick-Reduct Algorithms (original) (raw)

Microarray data with reference to gene expression profiles have provided some valuable results related to a variety of problems, and contributed to advances in clinical medicine. Microarray data characteristically have a high dimension and small sample size, which makes it difficult for a general classification method to obtain correct data for classification. However, not every gene is potentially relevant for distinguishing the sample class. Thus, in order to analyze gene expression profiles correctly, feature (gene) selection is crucial for the classification process, and an effective gene extraction method is necessary for eliminating irrelevant genes and decreasing the classification error rate. The purpose of gene expression analysis is to discriminate between classes of samples, and to predict the relative importance of each gene for sample classification. In this paper , irrelevant genes are eliminated in two stages, employing correlation-based feature selection (CFS)as an evaluator and binary particle swarm optimization (BPSO)as a search technique at the first phase and in the second phase of elimination it (CFS+PSO) combines Quick-Reduct algorithm and forms an integrated filter method. Since the data consist of a large number of redundant features, an initial redundancy reduction of the gene is done to enable faster convergence. Then Rough set theory is employed to generate reducts, which represent the minimal sets of non-redundant gene capable of discerning between all objects, in a multi-objective framework. The effectiveness of the proposed approach was verified on two different multi-class microarray datasets using K-nearest neighbour (K-NN), Naïve Bayes( NB) and J48 classifiers with leave-one-out cross-validation (LOOCV) method.