A Survey on Various Feature Selection Methodologies (original) (raw)

Survey on Feature Selection

Feature selection plays an important role in the data mining process. It is needed to deal with the excessive number of features, which can become a computational burden on the learning algorithms. It is also necessary, even when computational resources are not scarce, since it improves the accuracy of the machine learning tasks, as we will see in the upcoming sections. In this review, we discuss the different feature selection approaches, and the relation between them and the various machine learning algorithms. This report tries to compare between the existing feature selection approaches. I wrote this report as part of my MSc. degree in data mining program in the University of East Anglia.

Towards a better feature subset selection approach

2010

The selection of the optimal features subset and the classification has become an important issue in the data mining field. We propose a feature selection scheme based on slicing technique which was originally proposed for programming languages. The proposed approach called Case Slicing Technique (CST). Slicing means that we are interested in automatically obtaining that portion 'features' of the case responsible for specific parts of the solution of the case at hand. We show that our goal should be to eliminate the number of features by removing irrelevant once. Choosing a subset of the features may increase accuracy and reduce complexity of the acquired knowledge. Our experimental results indicate that the performance of CST as a method of feature subset selection is better than the performance of the other approaches which are RELIEF with Base Learning Algorithm (C4.5), RELIEF with K-Nearest Neighbour (K-NN), RELIEF with Induction of Decision Tree Algorithm (ID3) and RELIEF with Naïve Bayes (NB), which are mostly used in the feature selection task.

Machine Learning Based Supervised Feature Selection Algorithm for Data Mining

International Journal of Innovative Technology and Exploring Engineering, 2019

Data Scientists focus on high dimensional data to predict and reveal some interesting patterns as well as most useful information to the modern world. Feature Selection is a preprocessing technique which improves the accuracy and efficiency of mining algorithms. There exist a numerous feature selection algorithms. Most of the algorithms failed to give better mining results as the scale increases. In this paper, feature selection for supervised algorithms in data mining are considered and given an overview of existing machine learning algorithm for supervised feature selection. This paper introduces an enhanced supervised feature selection algorithm which selects the best feature subset by eliminating irrelevant features using distance correlation and redundant features using symmetric uncertainty. The experimental results show that the proposed algorithm provides better classification accuracy and selects minimum number of features.

Efficient Feature Subset Selection Algorithm for High Dimensional Data

International Journal of Electrical and Computer Engineering (IJECE)

Feature selection approach solves the dimensionality problem by removing irrelevant and redundant features. Existing Feature selection algorithms take more time to obtain feature subset for high dimensional data. This paper proposes a feature selection algorithm based on Information gain measures for high dimensional data termed as IFSA (Information gain based Feature Selection Algorithm) to produce optimal feature subset in efficient time and improve the computational performance of learning algorithms. IFSA algorithm works in two folds: First apply filter on dataset. Second produce the small feature subset by using information gain measure. Extensive experiments are carried out to compare proposed algorithm and other methods with respect to two different classifiers (Naive bayes and IBK) on microarray and text data sets. The results demonstrate that IFSA not only produces the most select feature subset in efficient time but also improves the classifier performance.

An Analysis of Feature Selection Algorithm and their Optimization - A Scrutiny

Webology, 2021

The eradication of correlated evidence of the enormous volume of the directory is designated as data mining. Extracting discriminate knowledge associate with the approach is performed by a feature of knowledge. Knowledge rejuvenation is carried out as features and the process is delineated as a feature selection mechanism. Feature selection is a subset of features, acquired more information. Before data mining, Feature selection is essential to trim down the elevated dimensional information. Without feature selection pre-processing techniques, classification required interminable calculation duration which might lead to intricacy. The foremost intention of the analysis is to afford a summary of feature selection approaches adopted to evaluate the extreme extensive features.

A Survey of Feature Selection Techniques

Encyclopedia of Data Warehousing and Mining, Second Edition

Dimensionality (i.e., the number of data set attributes or groups of attributes) constitutes a serious obstacle to the efficiency of most data mining algorithms (Maimon and Last, 2000). The main reason for this is that data mining algorithms are computationally intensive. This obstacle is sometimes known as the “curse of dimensionality” (Bellman, 1961). The objective of Feature Selection is to identify features in the data-set as important, and discard any other feature as irrelevant and redundant information. Since Feature Selection reduces the dimensionality of the data, data mining algorithms can be operated faster and more effectively by using Feature Selection. In some cases, as a result of feature selection, the performance of the data mining method can be improved. The reason for that is mainly a more compact, easily interpreted representation of the target concept. The filter approach (Kohavi , 1995; Kohavi and John ,1996) operates independently of the data mining method emp...

Feature Selection for Classification

Intelligent Data Analysis, 1997

Feature selection has been the focus of interest for quite some time and much work has been done. With the creation of huge databases and the consequent requirements for good machine learning techniques, new problems arise and novel approaches to feature selection are in demand. This survey is a comprehensive overview of many existing methods from the 1970's to the present. It identifies four steps of a typical feature selection method, and categorizes the different existing methods in terms of generation procedures and evaluation functions, and reveals hitherto unattempted combinations of generation procedures and evaluation functions. Representative methods are chosen from each category for detailed explanation and discussion via example. Benchmark datasets with different characteristics are used for comparative study. The strengths and weaknesses of different methods are explained. Guidelines for applying feature selection methods are given based on data types and domain characteristics. This survey identifies the future research areas in feature selection, introduces newcomers to this field, and paves the way for practitioners who search for suitable methods for solving domain-specific real-world applications.

FEATURE SELECTION AND CLASSIFICATION TECHNIQUES IN DATA MINING

Data mining is the process of analyzing data from different perspectives and summarizing it into useful information. Feature selection is one of the important techniques in data mining. It is used for selectin g the relevant features and removes the redundant features in dataset. Classification is a technique used for discovering classes of unknown data. Classification task leads to reduction of the dimensionality of feature space, feature selection process is u sed for selecting large set of features. This paper proposed various feature selection methods

A Comparative Study Between Feature Selection Algorithms

Data Mining and Big Data, 2018

In this paper, we show a comparative study between four algorithms used in features selection; these are: decision trees, entropy measure for ranking features, estimation of distribution algorithms, and the bootstrapping algorithm. Likewise, the features selection is highlighted as the most representative task in the elimination of noise, in order to improve the quality of the dataset. Subsequently, each algorithm is described in order that the reader understands its function. Then the algorithms are applied using different data sets and obtaining the results in the selection. Finally, the conclusions of this investigation are presented.