Feature Selection Methods: Genetic Algorithms vs. Greedy-like Search (original) (raw)

Genetic algorithms as a strategy for feature selection

Journal of Chemometrics, 1992

Genetic algorithms have been created as an optimization strategy to be used especially when complex response surfaces do not allow the use of better-known methods (simplex, experimental design techniques, etc.). This paper shows that these algorithms, conveniently modified, can also be a valuable tool in solving the feature selection problem. The subsets of variables selected by genetic algorithms are generally more efficient than those obtained by classical methods of feature selection, since they can produce a better result by using a lower number of features.

A genetic algorithm-based method for feature subset selection

Soft Computing, 2007

As a commonly used technique in data preprocessing, feature selection selects a subset of informative attributes or variables to build models describing data. By removing redundant and irrelevant or noise features, feature selection can improve the predictive accuracy and the comprehensibility of the predictors or classifiers. Many feature selection algorithms with different selection criteria has been introduced by researchers. However, it is discovered that no single criterion is best for all applications. In this paper, we propose a framework based on a genetic algorithm (GA) for feature subset selection that combines various existing feature selection methods. The advantages of this approach include the ability to accommodate multiple feature selection criteria and find small subsets of features that perform well for a particular inductive learning algorithm of interest to build the classifier. We conducted experiments using three data sets and three existing feature selection methods. The experimental results demonstrate that our approach is a robust and effective approach to find subsets of features with higher classification accuracy and/or smaller size compared to each individual feature selection algorithm.

Genetic algorithms for feature selection and weighting, a review and study

Proceedings of Sixth International Conference on Document Analysis and Recognition, 2001

). The main purpose of feature selection is to reduce the number of features, by eliminating irrelevant and redundant features, while simultaneously maintaining or enhancing classification accuracy. Many search algorithms have been used for feature selection. Among those, CA have proven to be an effective computational method, especially in situations where the search space is uncharacterized (mathematically), not fully understood, or/and highly dimensional.

Classifying Different Feature Selection Algorithms Based on the Search Strategies

Data mining is an inevitable step in knowledge discovery and it helps discovering hidden and useful patterns among data. These days, the number of stored attributes for each entity in databases is rapidly growing. But not only all of these attributes (features) are not useful for data mining, but also some irrelevant attributes make the result of data mining more complex and less understandable. As a result, in databases with so many features to handle, more valuable and relevant features are selected and redundant and irrelevant features are ignored by feature selection algorithms. Feature selection reduces the dimension of databases and makes the results more useful. Feature subset selection algorithms can be divided into two categories: filter approach and the wrapper approach. From the other point of view, we can categorize feature selection algorithmsregardless of their correspondence with filter or wrapper approach in 4 groups: complete search, heuristic search, meta-heuristic methods and methods that use artificial neural networks. The aim of this article is to classify the recent presented methods into the mentioned groups and we will review some of them. Also, we have attempted to compare the groups with each other.

An empirical study of feature selection for classification using genetic algorithm

International Journal of Advanced Intelligence Paradigms, 2018

Feature selection is one of the most important pre-processing steps for a data mining, pattern recognition or machine learning problem. Features get eliminated because either they are irrelevant, or they are redundant. As per literature study, most of the approaches combine the above objectives in a single numeric measure. In this paper, in contrast the problem of finding optimal feature subset has been formulated as a multi objective problem. The concept of redundancy is further refined with a concept of threshold value. Additionally, an objective of maximising entropy has been added. An extensive empirical study has been setup which uses 33 publicly available dataset. A 12% improvement in classification accuracy is reported in a multi objective setup. Other suggested refinements have shown to improve the performance measure. The performance improvement is statistical significant as found by pair wise t-test and Friedman's test.

Genetic algorithm as an attributes selection tool for learning algorithms

2004

Abstract. Learning algorithms, as NN or C4. 5 require adequate sets of examples. In the paper we present the usability of genetic algorithms for selection significant features. Fitness of individuals is calculated on the basis of classification quality using NN or C4. 5 algorithm. Results confirm that selected by GA significant features for C4. 5 are also useful for NNs algorithm, but-what is interesting-not in the opposite direction.

A New Evaluation Measure for Feature Subset Selection with Genetic Algorithm

International Journal of Intelligent Systems and Applications, 2015

Feature selection is one of the most important preprocessing steps for a data mining, pattern recognition or machine learning problem. Finding an optimal subset of features, among all the combinations is a NP-Complete problem. Lot of research has been done in feature selection. However, as the sizes of the datasets are increasing and optimality is a subjective notion, further research is needed to find better techniques. In this paper, a genetic algorithm based feature subset selection method has been proposed with a novel feature evaluation measure as the fitness function. The evaluation measure is different in three primary ways a) It considers the information content of the features apart from relevance with respect to the target b) The redundancy is considered only when it is over a threshold value c) There is lesser penalization towards cardinality of the subset. As the measure accepts value of few parameters, this is available for tuning as per the need of the particular problem domain. Experiments conducted over 21 well known publicly available datasets reveal superior performance. Hypothesis testing for the accuracy improvement is found to be statistically significant.