Feature selection for classification: A review (original) (raw)
Related papers
Feature selection, extraction and construction
2002
Abstract Feature selection is a process that chooses a subset of features from the original features so that the feature space is optimally reduced according to a certain criterion. Feature extraction/construction is a process through which a set of new features is created. They are used either in isolation or in combination. All attempt to improve performance such as estimated accuracy, visualization and comprehensibility of learned knowledge. Basic approaches to these three are reviewed giving pointers to references for further studies.
Interdisciplinary Publishing Academia, 2020
Due to sharp increases in data dimensions, working on every data mining or machine learning (ML) task requires more efficient techniques to get the desired results. Therefore, in recent years, researchers have proposed and developed many methods and techniques to reduce the high dimensions of data and to attain the required accuracy. To ameliorate the accuracy of learning features as well as to decrease the training time dimensionality reduction is used as a pre-processing step, which can eliminate irrelevant data, noise, and redundant features. Dimensionality reduction (DR) has been performed based on two main methods, which are feature selection (FS) and feature extraction (FE). FS is considered an important method because data is generated continuously at an ever-increasing rate; some serious dimensionality problems can be reduced with this method, such as decreasing redundancy effectively, eliminating irrelevant data, and ameliorating result comprehensibility. Moreover, FE transacts with the problem of finding the most distinctive, informative, and decreased set of features to ameliorate the efficiency of both the processing and storage of data. This paper offers a comprehensive approach to FS and FE in the scope of DR. Moreover, the details of each paper, such as used algorithms/approaches, datasets, classifiers, and achieved results are comprehensively analyzed and summarized. Besides, a systematic discussion of all of the reviewed methods to highlight authors' trends, determining the method(s) has been done, which significantly reduced computational time, and selecting the most accurate classifiers. As a result, the different types of both methods have been discussed and analyzed the findings.
A comparison of feature extraction and selection techniques
… on Artificial Neural …, 2003
We have applied several dimensionality reduction techniques to data modelling using neural network architectures for classification using a number of data sets. The reduction methods considered include both linear and non linear forms of principal components analysis, genetic algorithms and sensitivity analysis. The results of each were used as inputs to several types of neural network architecture, specifically the performance of Multi-layer perceptrons, (MLPs), Radial basis function networks (RBFs) and Generalised regression neural networks. Our results suggest considerable improvements in accuracy can be achieved by the use of simple network sensitivity analysis, compared to genetic algorithms, and both forms of principal component analysis.
A Survey on Various Feature Selection Methodologies
The process of selecting features is important process in machine learning; this is method of selecting a subset of relevant/ significant variables and features. Feature selection is applicable in multiple areas such as anomaly detection, Bioinformatics, image processing, etc. where high dimensional data is generated. Analysis and classification of such big data is time consuming. Feature set selection is generally user for: to simplify model data set, reducing over fitting, increase efficiency of classifier. In this paper we have analyzed various techniques for extraction of features and feature subset collection. The main objective behind this research was to find a better algorithm for extraction of features and feature subset collection with efficiency. Subsequently, several methods for the extraction and selection of features have been suggested to attain the highest relevance.
FEATURE EXTRACTION AND CLASSIFICATION SC
AAU, 2011
Dr. Million Meshesha, whose encouragement, guidance and support from the initial to the final level enabled me to develop an understanding of the subject. His sage advice, insightful criticisms, and patient encouragement aided the writing of this thesis in innumerable ways. Besides his expertise, I really appreciate his patience and thank him for his concern and perspective advice through my thesis work as well as course of Master's program. My greatest gratitude also goes to Mekelle University (MU) for granting me study leave with the necessary benefits, without which I could not have been able to join my M.Sc. study here in AAU. I also extend my sincere thanks to academic and administrative staffs of Department of Computer Science (MU).
Analysis and Evaluation of Feature Selection and Feature Extraction Methods
International Journal of Computational Intelligence Systems
Hand gestures are widely used in human-to-human and human-to-machine communication. Therefore, hand gesture recognition is a topic of great interest. Hand gesture recognition is closely related to pattern recognition, where overfitting can occur when there are many predictors relative to the size of the training set. Therefore, it is necessary to reduce the dimensionality of the feature vectors through feature selection techniques. In addition, the need for portability in hand gesture recognition systems limits the use of deep learning algorithms. In this sense, a study of feature selection and extraction methods is proposed for the use of traditional machine learning algorithms. The feature selection methods analyzed are: maximum relevance and minimum redundancy (MRMR), Sequential, neighbor component analysis without parameters (NCAsp), neighbor component analysis with parameters (NCAp), Relief-F, and decision tree (DT). We also analyze the behavior of feature selection methods usi...
A Review on Dimensionality Reduction Techniques
International Journal of Computer Applications
Progress in digital data acquisition and storage technology has resulted in exponential growth in high dimensional data. Removing redundant and irrelevant features from this highdimensional data helps in improving mining performance and comprehensibility and increasing learning accuracy. Feature selection and feature extraction techniques as a preprocessing step are used for reducing data dimensionality. This paper analyses some existing popular feature selection and feature extraction techniques and addresses benefits and challenges of these algorithms which would be beneficial for beginners..
A Novel Feature Selection and Extraction Technique for Classification
Pattern recognition is a vast field which has seen significant advances over the years. As the datasets under consideration grow larger and more comprehensive, using efficient techniques to process them becomes increasingly important. We present a versatile technique for the purpose of feature selection and extraction -Class Dependent Features (CDFs). CDFs identify the features innate to a class and extract them accordingly. The features thus extracted are relevant to the entire class and not just to the individual data item. This paper focuses on using CDFs to improve the accuracy of classification and at the same time control computational expense by tackling the curse of dimensionality. In order to demonstrate the generality of this technique, it is applied to two problem statements which have very little in common with each other -handwritten digit recognition and text categorization. It is found that for both problem statements, the accuracy is comparable to state-of-the-art results and the speed of the operation is considerably greater. Results are presented for Reuters-21578 and Web-KB datasets relating to text categorization and the MNIST and USPS datasets for handwritten digit recognition.
A Framework To Integrate Feature Selection Algorithm For Classification Of High Dimensional Data
The explosive usage of social media produces huge quality of unlabeled and high-dimensional data. The data characteristic with this choice has been tested to be powerful in handling excessive-dimensional facts for effective learning and data mining. In this high-dimensional unsupervised function choice stays a tough task due to the absence of label facts based on which feature relevance is frequently assessed. The specific characteristic of social media statistics further complicate the difficult hassle of unsupervised characteristic selection which makes invalid and identically allotted assumption. In this context bringing approximately new demanding situations to unsupervised characteristic selection algorithmsis a big task. In this paper, we proposed a multiple trouble of function choice for social media records in an unmonitored scenario.Next, analyze the variations among social media data and traditional attribute-fee statistics which looks into the family members extracted from linked statistics to be exploited for selecting applicable functions. Finally, advocate a novel unsupervised feature choice framework, WSLA(Web Server Log Analyzer), for related social media information. Systematically style and implement the general experiments to assess the planned framework on info sets from real-global social media internet sites.The empirical study reveals the learing space of unsupervised feature selection is more powerful and can be extended to different without labeled data with additional information. Keywords— Feature Selection; Classification; Web minig:High Dimensional Data; Data Preprocessing; WSLA(Web Server Log Analyzer)