Feature selection, extraction and construction (original) (raw)

Proposing Enhanced Feature Engineering and a Selection Model for Machine Learning Processes

Applied Sciences, 2018

Machine Learning (ML) requires a certain number of features (i.e., attributes) to train the model. One of the main challenges is to determine the right number and the type of such features out of the given dataset's attributes. It is not uncommon for the ML process to use dataset of available features without computing the predictive value of each. Such an approach makes the process vulnerable to overfit, predictive errors, bias, and poor generalization. Each feature in the dataset has either a unique predictive value, redundant, or irrelevant value. However, the key to better accuracy and fitting for ML is to identify the optimum set (i.e., grouping) of the right feature set with the finest matching of the feature's value. This paper proposes a novel approach to enhance the Feature Engineering and Selection (eFES) Optimization process in ML. eFES is built using a unique scheme to regulate error bounds and parallelize the addition and removal of a feature during training. eFES also invents local gain (LG) and global gain (GG) functions using 3D visualizing techniques to assist the feature grouping function (FGF). FGF scores and optimizes the participating feature, so the ML process can evolve into deciding which features to accept or reject for improved generalization of the model. To support the proposed model, this paper presents mathematical models, illustrations, algorithms, and experimental results. Miscellaneous datasets are used to validate the model building process in Python, C#, and R languages. Results show the promising state of eFES as compared to the traditional feature selection process.

A comparison of feature extraction and selection techniques

… on Artificial Neural …, 2003

We have applied several dimensionality reduction techniques to data modelling using neural network architectures for classification using a number of data sets. The reduction methods considered include both linear and non linear forms of principal components analysis, genetic algorithms and sensitivity analysis. The results of each were used as inputs to several types of neural network architecture, specifically the performance of Multi-layer perceptrons, (MLPs), Radial basis function networks (RBFs) and Generalised regression neural networks. Our results suggest considerable improvements in accuracy can be achieved by the use of simple network sensitivity analysis, compared to genetic algorithms, and both forms of principal component analysis.

A Survey on Various Feature Selection Methodologies

The process of selecting features is important process in machine learning; this is method of selecting a subset of relevant/ significant variables and features. Feature selection is applicable in multiple areas such as anomaly detection, Bioinformatics, image processing, etc. where high dimensional data is generated. Analysis and classification of such big data is time consuming. Feature set selection is generally user for: to simplify model data set, reducing over fitting, increase efficiency of classifier. In this paper we have analyzed various techniques for extraction of features and feature subset collection. The main objective behind this research was to find a better algorithm for extraction of features and feature subset collection with efficiency. Subsequently, several methods for the extraction and selection of features have been suggested to attain the highest relevance.

How dependencies affect the capability of several feature selection approaches to extract the key features

The goal of our research is to find how dependencies affect the capability of several feature selection approaches to extract of the relevant features for a classification purpose. The hypothesis is that more dependencies and higher level dependencies mean more complexity for the task. In our experiments, we intended to discover some limitations of several feature selection approaches by altering the degree of dependency of the test datasets. We proposed a new method using pre-designed Bayesian Networks to generate the test datasets with an easy tuning level of complexity for feature selection test. Relief, CFS, NB-GA, NB-BOA, SVM-GA, SVM-BOA and SVM-mBOA are the filter or wrapper model feature selection approaches which we are used and evaluated in our experiments. For these approaches, we found that the higher level of dependency among the relevant features greatly affect the capability to find the relevant features for classification. For Relief, SVM-BOA and SVM-mBOA, if the dependencies among the irrelevant features are altered, the performance of these feature selection approaches change as well. Moreover, a multiobjective optimization method is used to keep the diversity of the populations in each generation of the BOA search algorithm improving the overall quality of solutions in our experiments.

Feature Selection: A literature Review

The Smart Computing Review, 2014

Relevant feature identification has become an essential task to apply data mining algorithms effectively in real-world scenarios. Therefore, many feature selection methods have been proposed to obtain the relevant feature or feature subsets in the literature to achieve their objectives of classification and clustering. This paper introduces the concepts of feature relevance, general procedures, evaluation criteria, and the characteristics of feature selection. A comprehensive overview, categorization, and comparison of existing feature selection methods are also done, and the guidelines are also provided for user to select a feature selection algorithm without knowing the information of each algorithm. We conclude this work with real world applications, challenges, and future research directions of feature selection.

Current And Future Trends In Feature Selection And Extraction For Classification Problems

International Journal of Pattern Recognition and Artificial Intelligence, 2005

In this article, we describe some of the important currently used methods for solving classification problems, focusing on feature selection and extraction as parts of the overall classification task. We then go on to discuss likely future directions for research in this area, in the context of the other articles from this special issue. We propose that the next major step is the elaboration of a theory of how the methods of selection and extraction interact during the classification process for particular problem domains, along with any learning that may be part of the algorithms. Preferably this theory should be tested on a set of well-established benchmark challenge problems. Using this theory, we will be better able to identify the specific combinations that will achieve best classification performance for new tasks.