A genetic algorithms-based approach for selecting the most relevant input variables in classification tasks (original) (raw)
Related papers
Variable selection is an important task in machine learning and data mining applications. In many real world problems a huge volume of data is often available which corresponds to a large number of variables. When developing a model for classification, clustering or other applications, the search of an optimal subset of relevant input variables is crucial. In this paper an automatic variable selection method, which combines Genetic Algorithms and Self Organizing Maps, is proposed for classification purposes. The Genetic Algorithm is used to select the most relevant input variables and to set some relevant parameters of a classifier implemented through a Self Organizing Map. This method has been tested with several datasets belonging to the UCI repository. The results of the tests are presented and discussed in this paper. The proposed approach provides a good classification accuracy and contributes to the comprehension of the phenomenon under consideration.
Variable selection through genetic algorithms for classification purposes
In many real world applications, when coping with classification tasks, the problem of the selection of the variables to be used for the development of any kind of classifier has to be faced. This necessity is normally due to the high number of variables which could be potentially included in the input set combined with the lack of a priori knowledge to support the selection process.
Genetic algorithms as a strategy for feature selection
Journal of Chemometrics, 1992
Genetic algorithms have been created as an optimization strategy to be used especially when complex response surfaces do not allow the use of better-known methods (simplex, experimental design techniques, etc.). This paper shows that these algorithms, conveniently modified, can also be a valuable tool in solving the feature selection problem. The subsets of variables selected by genetic algorithms are generally more efficient than those obtained by classical methods of feature selection, since they can produce a better result by using a lower number of features.
Ingenieria e Investigación
Pattern recognition performance depends on variations during extraction, selection and classification stages. This paper presents an approach to feature selection by using genetic algorithms with regard to digital image recognition and quality control. Error rate and kappa coefficient were used for evaluating the genetic algorithm approach Neural networks were used for classification, involving the features selected by the genetic algorithms. The neural network approach was compared to a K-nearest neighbour classifier. The proposed approach performed better than the other methods.
Genetic Support Vector Classification and Feature Selection
2008
An important issue regarding the design of support vector machines (SVMs) is considered in this article, namely, the fine tuning of parameters in SVMs. This problem is tackled by using a self-adaptive genetic algorithm (GA). The same GA is used for feature selection. We validate our results implementing some statistical tests based on single domain benchmark data sets, which are used for comparison with other traditional methods. One of these methods is commonly used for the selection of parameters in SVMs.
An empirical study of feature selection for classification using genetic algorithm
International Journal of Advanced Intelligence Paradigms, 2018
Feature selection is one of the most important pre-processing steps for a data mining, pattern recognition or machine learning problem. Features get eliminated because either they are irrelevant, or they are redundant. As per literature study, most of the approaches combine the above objectives in a single numeric measure. In this paper, in contrast the problem of finding optimal feature subset has been formulated as a multi objective problem. The concept of redundancy is further refined with a concept of threshold value. Additionally, an objective of maximising entropy has been added. An extensive empirical study has been setup which uses 33 publicly available dataset. A 12% improvement in classification accuracy is reported in a multi objective setup. Other suggested refinements have shown to improve the performance measure. The performance improvement is statistical significant as found by pair wise t-test and Friedman's test.
Genetic Algorithms for Classification and Feature Extraction
Annual Meeting: Classification …, 1995
Min Pei, 1,2 Erik D. Goodman, 2 William F. Punch III 3 and Ying Ding 2 ... 1 Beijing Union University, Beijing, China 2 Case Center for Computer-Aided Engineering and Manufacturing 3 Intelligent Systems Laboratory, Department of Computer Science
Information Sciences, 2001
The inductive learning of a fuzzy rule-based classi®cation system (FRBCS) is made dicult by the presence of a large number of features that increases the dimensionality of the problem being solved. The diculty comes from the exponential growth of the fuzzy rule search space with the increase in the number of features considered in the learning process. In this work, we present a genetic feature selection process that can be integrated in a multistage genetic learning method to obtain, in a more ecient way, FRBCSs composed of a set of comprehensible fuzzy rules with high-classi®cation ability. The proposed process ®xes, a priori, the number of selected features, and therefore, the size of the search space of candidate fuzzy rules. The experimentation carried out, using Sonar example base, shows a signi®cant improvement on simplicity, precision and eciency achieved by adding the proposed feature selection processes to the multistage genetic learning method or to other learning methods. Ó