A Study of Feature Subset Selection Methods for Dimension Reduction (original) (raw)
Related papers
Dimension Reduction and Feature Selection
Data Mining and Knowledge Discovery Handbook
Data Mining algorithms search for meaningful patterns in raw data sets. The Data Mining process requires high computational cost when dealing with large data sets. Reducing dimensionality (the number of attributed or the number of records) can effectively cut this cost. This chapter focuses a pre-processing step which removes dimension from a given data set before it is fed to a data mining algorithm. This work explains how it is often possible to reduce dimensionality with minimum loss of information. Clear dimension reduction taxonomy is described and techniques for dimension reduction are presented theoretically.
A Review on Dimensionality Reduction Techniques in Data Mining
2019
Data mining is a form of knowledge discovery essential for solving problems in a specific domain. Classification is a technique used for discovering classes of unknown data. Various methods for classification exists like Bayesian, Decision Trees and Rule based neural networks etc. Before applying any mining technique, irrelevantattributes needs to be filtered. Filtering is done using different feature selection techniques like wrapper, filter, and embedded technique. Feature selection plays an important role in data mining and machine learning. It helps to reduce the dimensionality of data and increase the performance of classification algorithms. A variety of feature selection methods have been presented in state-of-the-art literature to resolve feature selection problems such as large search space in high dimensional datasets like in microarray. However, it is a challenging task to identify the best feature selection method that suits a specific scenario or situation. Dimensionali...
Dimension Reduction Methodology using Group Feature Selection
Feature selection has become a remarkable research topic in recent years. It is an efficient methodology to tackle the information with high dimension. The underlying structure has been neglected by the previous feature choice technique and it determines the feature singly. Considering this truth, we are going to focus on the matter wherever feature possess some cluster structure. To resolve this downside we are using cluster feature selection technique at cluster level to execute feature choice. Its objective is to execute the feature within the cluster and between the cluster that choose discriminative features and take away redundant options to get optimum subset. We have demonstrate our technique on benchmark knowledge sets and perform the task to attain classification accuracy.
Data mining is a part in the process of Knowledge discovery from data (KDD). The performance of data mining algorithms mainly depends on the effectiveness of preprocessing algorithms. Dimensionality reduction plays an important role in preprocessing. By research, many methods have been proposed for dimensionality reduction, beside the feature subset selection and feature-ranking methods show significant achievement in dimensionality reduction by removing irrelevant and redundant features in highdimensional data. This improves the prediction accuracy of the classifier, reduces the false prediction ratio and reduces the time and space complexity for building the prediction model. This paper presents an empirical study analysis on feature subset evaluators Cfs, Consistency and Filtered, Feature Rankers Chi-squared and Information-gain. The performance of these methods is analyzed with the focus on dimensionality reduction and improvement of classification accuracy using wide range of test datasets and classification algorithms namely probability-based Naive Bayes, tree-based C4.5(J48) and instance-based IB1.
A Review on Dimension Reduction Techniques in Data Mining
2018
Real world data is high-dimensional like images, speech signals containing multiple dimensions to represent data. Higher dimensional data are more complex for detecting and exploiting the relationships among terms. Dimensionality reduction is a technique used for reducing complexity for analyzing high dimensional data. There are many methodologies that are being used to find the Critical Dimensions for a dataset that significantly reduces the number of dimensions. They reduce the dimensions from the original input data. Dimensionality reduction methods can be of two types as feature extractions and feature selection techniques. Feature Extraction is a distinct form of Dimensionality Reduction to extract some important feature from input dataset. Two different approaches available for dimensionality reduction are supervised approach and unsupervised approach. One exclusive purpose of this survey is to provide an adequate comprehension of the different dimensionality reduction techniq...
A Survey on Various Feature Selection Methodologies
The process of selecting features is important process in machine learning; this is method of selecting a subset of relevant/ significant variables and features. Feature selection is applicable in multiple areas such as anomaly detection, Bioinformatics, image processing, etc. where high dimensional data is generated. Analysis and classification of such big data is time consuming. Feature set selection is generally user for: to simplify model data set, reducing over fitting, increase efficiency of classifier. In this paper we have analyzed various techniques for extraction of features and feature subset collection. The main objective behind this research was to find a better algorithm for extraction of features and feature subset collection with efficiency. Subsequently, several methods for the extraction and selection of features have been suggested to attain the highest relevance.
A Review on Dimensionality Reduction Techniques
International Journal of Computer Applications
Progress in digital data acquisition and storage technology has resulted in exponential growth in high dimensional data. Removing redundant and irrelevant features from this highdimensional data helps in improving mining performance and comprehensibility and increasing learning accuracy. Feature selection and feature extraction techniques as a preprocessing step are used for reducing data dimensionality. This paper analyses some existing popular feature selection and feature extraction techniques and addresses benefits and challenges of these algorithms which would be beneficial for beginners..
Efficient feature subset selection model for high dimensional data
International Journal on Cybernetics & Informatics, 2016
This paper proposes a new method that intends on reducing the size of high dimensional dataset by identifying and removing irrelevant and redundant features. Dataset reduction is important in the case of machine learning and data mining. The measure of dependence is used to evaluate the relationship between feature and target concept and or between features for irrelevant and redundant feature removal. The proposed work initially removes all the irrelevant features and then a minimum spanning tree of relevant features is constructed using Prim's algorithm. Splitting the minimum spanning tree based on the dependency between features leads to the generation of forests. A representative feature from each of the forests is taken to form the final feature subset.
Interdisciplinary Publishing Academia, 2020
Due to sharp increases in data dimensions, working on every data mining or machine learning (ML) task requires more efficient techniques to get the desired results. Therefore, in recent years, researchers have proposed and developed many methods and techniques to reduce the high dimensions of data and to attain the required accuracy. To ameliorate the accuracy of learning features as well as to decrease the training time dimensionality reduction is used as a pre-processing step, which can eliminate irrelevant data, noise, and redundant features. Dimensionality reduction (DR) has been performed based on two main methods, which are feature selection (FS) and feature extraction (FE). FS is considered an important method because data is generated continuously at an ever-increasing rate; some serious dimensionality problems can be reduced with this method, such as decreasing redundancy effectively, eliminating irrelevant data, and ameliorating result comprehensibility. Moreover, FE transacts with the problem of finding the most distinctive, informative, and decreased set of features to ameliorate the efficiency of both the processing and storage of data. This paper offers a comprehensive approach to FS and FE in the scope of DR. Moreover, the details of each paper, such as used algorithms/approaches, datasets, classifiers, and achieved results are comprehensively analyzed and summarized. Besides, a systematic discussion of all of the reviewed methods to highlight authors' trends, determining the method(s) has been done, which significantly reduced computational time, and selecting the most accurate classifiers. As a result, the different types of both methods have been discussed and analyzed the findings.
Efficient Feature Subset Selection Algorithm for High Dimensional Data
International Journal of Electrical and Computer Engineering (IJECE)
Feature selection approach solves the dimensionality problem by removing irrelevant and redundant features. Existing Feature selection algorithms take more time to obtain feature subset for high dimensional data. This paper proposes a feature selection algorithm based on Information gain measures for high dimensional data termed as IFSA (Information gain based Feature Selection Algorithm) to produce optimal feature subset in efficient time and improve the computational performance of learning algorithms. IFSA algorithm works in two folds: First apply filter on dataset. Second produce the small feature subset by using information gain measure. Extensive experiments are carried out to compare proposed algorithm and other methods with respect to two different classifiers (Naive bayes and IBK) on microarray and text data sets. The results demonstrate that IFSA not only produces the most select feature subset in efficient time but also improves the classifier performance.