Towards a better feature subset selection approach (original) (raw)
Related papers
A Survey on Various Feature Selection Methodologies
The process of selecting features is important process in machine learning; this is method of selecting a subset of relevant/ significant variables and features. Feature selection is applicable in multiple areas such as anomaly detection, Bioinformatics, image processing, etc. where high dimensional data is generated. Analysis and classification of such big data is time consuming. Feature set selection is generally user for: to simplify model data set, reducing over fitting, increase efficiency of classifier. In this paper we have analyzed various techniques for extraction of features and feature subset collection. The main objective behind this research was to find a better algorithm for extraction of features and feature subset collection with efficiency. Subsequently, several methods for the extraction and selection of features have been suggested to attain the highest relevance.
FEATURE SELECTION AND CLASSIFICATION TECHNIQUES IN DATA MINING
Data mining is the process of analyzing data from different perspectives and summarizing it into useful information. Feature selection is one of the important techniques in data mining. It is used for selectin g the relevant features and removes the redundant features in dataset. Classification is a technique used for discovering classes of unknown data. Classification task leads to reduction of the dimensionality of feature space, feature selection process is u sed for selecting large set of features. This paper proposed various feature selection methods
Data mining is a part in the process of Knowledge discovery from data (KDD). The performance of data mining algorithms mainly depends on the effectiveness of preprocessing algorithms. Dimensionality reduction plays an important role in preprocessing. By research, many methods have been proposed for dimensionality reduction, beside the feature subset selection and feature-ranking methods show significant achievement in dimensionality reduction by removing irrelevant and redundant features in highdimensional data. This improves the prediction accuracy of the classifier, reduces the false prediction ratio and reduces the time and space complexity for building the prediction model. This paper presents an empirical study analysis on feature subset evaluators Cfs, Consistency and Filtered, Feature Rankers Chi-squared and Information-gain. The performance of these methods is analyzed with the focus on dimensionality reduction and improvement of classification accuracy using wide range of test datasets and classification algorithms namely probability-based Naive Bayes, tree-based C4.5(J48) and instance-based IB1.
Machine Learning Based Supervised Feature Selection Algorithm for Data Mining
International Journal of Innovative Technology and Exploring Engineering, 2019
Data Scientists focus on high dimensional data to predict and reveal some interesting patterns as well as most useful information to the modern world. Feature Selection is a preprocessing technique which improves the accuracy and efficiency of mining algorithms. There exist a numerous feature selection algorithms. Most of the algorithms failed to give better mining results as the scale increases. In this paper, feature selection for supervised algorithms in data mining are considered and given an overview of existing machine learning algorithm for supervised feature selection. This paper introduces an enhanced supervised feature selection algorithm which selects the best feature subset by eliminating irrelevant features using distance correlation and redundant features using symmetric uncertainty. The experimental results show that the proposed algorithm provides better classification accuracy and selects minimum number of features.
Feature Selection as an Improving Step for Decision Tree Construction
2009
The removal of irrelevant or redundant attributes could benefit us in making decisions and analyzing data efficiently. Feature Selection is one of the most important and frequently used techniques in data preprocessing for data mining. In this paper, special attention is made on feature selection for classification with labeled data. Here an algorithm is used that arranges attributes based on their importance using two independent criteria. Then, the arranged attributes can be used as input one simple and powerful algorithm for construction decision tree (Oblivious Tree). Results indicate that this decision tree using featured selected by proposed algorithm outperformed decision tree without feature selection. From the experimental results, it is observed that, this method generates smaller tree having an acceptable accuracy.
An Analysis of Feature Selection Algorithm and their Optimization - A Scrutiny
Webology, 2021
The eradication of correlated evidence of the enormous volume of the directory is designated as data mining. Extracting discriminate knowledge associate with the approach is performed by a feature of knowledge. Knowledge rejuvenation is carried out as features and the process is delineated as a feature selection mechanism. Feature selection is a subset of features, acquired more information. Before data mining, Feature selection is essential to trim down the elevated dimensional information. Without feature selection pre-processing techniques, classification required interminable calculation duration which might lead to intricacy. The foremost intention of the analysis is to afford a summary of feature selection approaches adopted to evaluate the extreme extensive features.
Introduction to feature subset selection method
Data Mining is a computational progression to ascertain patterns in hefty data sets. It has various important techniques and one of them is Classification which is receiving great attention recently in the database community. Classification technique can solve several problems in different fields like medicine, industry, business, science. PSO is based on social behaviour for optimization problem. Feature Selection (FS) is a solution that involves finding a subset of prominent features to improve predictive accuracy and to remove the redundant features. Rough Set Theory (RST) is a mathematical tool which deals with the uncertainty and vagueness of the decision systems.
A comparative study on the effect of feature selection on classification accuracy
Procedia Technology, 2012
Feature selection has become interest to many research areas which deal with machine learning and data mining, because it provides the classifiers to be fast, cost-effective, and more accurate. In this paper the effect of feature selection on the accuracy of NaïveBayes, Artificial Neural Network as Multilayer Perceptron, and J48 decision tree classifiers is presented. These classifiers are compared with fifteen real datasets which are pre-processed with feature selection methods. Up to 15.55% improvement in classification accuracy is observed, and Multilayer Perceptron appears to be the most sensitive classifier to feature selection.
A Novel Feature Selection and Extraction Technique for Classification
Pattern recognition is a vast field which has seen significant advances over the years. As the datasets under consideration grow larger and more comprehensive, using efficient techniques to process them becomes increasingly important. We present a versatile technique for the purpose of feature selection and extraction -Class Dependent Features (CDFs). CDFs identify the features innate to a class and extract them accordingly. The features thus extracted are relevant to the entire class and not just to the individual data item. This paper focuses on using CDFs to improve the accuracy of classification and at the same time control computational expense by tackling the curse of dimensionality. In order to demonstrate the generality of this technique, it is applied to two problem statements which have very little in common with each other -handwritten digit recognition and text categorization. It is found that for both problem statements, the accuracy is comparable to state-of-the-art results and the speed of the operation is considerably greater. Results are presented for Reuters-21578 and Web-KB datasets relating to text categorization and the MNIST and USPS datasets for handwritten digit recognition.
Empirical study of feature selection methods in classification
2008
Abstract The use of feature selection can improve accuracy, efficiency, applicability and understandability of a learning process and the resulting learner. For this reason, many methods of automatic feature selection have been developed. By using the modularization of feature selection process, this paper evaluates a wide spectrum of these methods and some additional ones created by combination of different search and measure modules.