An Effective Feature Selection Method Based on Pair-Wise Feature Proximity for High Dimensional Low Sample Size Data (original) (raw)
Related papers
Unsupervised feature selection using feature similarity
IEEE Transactions on Pattern Analysis …, 2002
AbstractIn this article, we describe an unsupervised feature selection algorithm suitable for data sets, large in both dimension and size. The method is based on measuring similarity between features whereby redundancy therein is removed. This does not need any search and, ...
A new unsupervised feature selection algorithm using similarity‐based feature clustering
Computational Intelligence, 2018
Unsupervised feature selection is an important problem, especially for high‐dimensional data. However, until now, it has been scarcely studied and the existing algorithms cannot provide satisfying performance. Thus, in this paper, we propose a new unsupervised feature selection algorithm using similarity‐based feature clustering, Feature Selection‐based Feature Clustering (FSFC). FSFC removes redundant features according to the results of feature clustering based on feature similarity. First, it clusters the features according to their similarity. A new feature clustering algorithm is proposed, which overcomes the shortcomings of K‐means. Second, it selects a representative feature from each cluster, which contains most interesting information of features in the cluster. The efficiency and effectiveness of FSFC are tested upon real‐world data sets and compared with two representative unsupervised feature selection algorithms, Feature Selection Using Similarity (FSUS) and Multi‐Clust...
Literature Review on Feature Selection Methods for High-Dimensional Data
Feature selection plays a significant role in improving the performance of the machine learning algorithms in terms of reducing the time to build the learning model and increasing the accuracy in the learning process. Therefore, the researchers pay more attention on the feature selection to enhance the performance of the machine learning algorithms. Identifying the suitable feature selection method is very essential for a given machine learning task with high-dimensional data. Hence, it is required to conduct the study on the various feature selection methods for the research community especially dedicated to develop the suitable feature selection method for enhancing the performance of the machine learning tasks on high-dimensional data. In order to fulfill this objective, this paper devotes the complete literature review on the various feature selection methods for high-dimensional data.
Feature selection using nearest attributes
2012
Feature selection is an important problem in high-dimensional data analysis and classification. Conventional feature selection approaches focus on detecting the features based on a redundancy criterion using learning and feature searching schemes. In contrast, we present an approach that identifies the need to select features based on their discriminatory ability among classes. Area of overlap between inter-class and intra-class distances resulting from feature to feature comparison of an attribute is used as a measure of discriminatory ability of the feature. A set of nearest attributes in a pattern having the lowest area of overlap within a degree of tolerance defined by a selection threshold is selected to represent the best available discriminable features. State of the art recognition results are reported for pattern classification problems by using the proposed feature selection scheme with the nearest neighbour classifier. These results are reported with benchmark databases having high dimensional feature vectors in the problems involving images and micro array data.
A proposal to improve the performance of feature selection methods with low-sample-size data
2019
Feature selection refers to a critical preprocessing of machine learning to remove irrelevant and redundant data. According to feature selection methods, sufficient samples are usually required to select a reliable feature subset, especially considering the presence of outliers. However, sufficient samples cannot always be ensured in several real-world applications (e.g. neuroimaging, bioinformatics, psychology, as well as sport sciences). In this study, a method to improve the performance of feature selection methods with low-sample-size data was proposed, which is named Feature Selection Based on Data Quality and Variable Training Samples (QVT). Given that none of the considered feature selection methods perform optimally in all scenarios, QVT is primarily characterized by its versatility, because it can be implemented in any feature selection method. An experiment was performed using 20 benchmark datasets, three feature selection methods and three classifiers to verify the feasib...
Deliverable, D14. 1 IST Project MiningMart, IST, 2002
The problem of feature selection is fundamental in a number of different tasks like classification, data mining, image processing, conceptual learning, etc... In recent times, the growing inportance of knowledge discovery and data-mining approaches in practical applications has made the feature selection problem a quite hot topic, especially when considering the mining of knowledge from real-world databases or warehouses, containing not only a huge amount of records, but also a significant number of features not always ...