Enhancing the Performance of Classifier Using Particle Swarm Optimization (PSO) - based Dimensionality Reduction (original) (raw)
Related papers
Performance Analysis of Particle Swarm Optimization for Feature Selection
FUOYE Journal of Engineering and Technology, 2019
One of the key task in data mining is the selection of relevant features from datasets with high dimensionality. This is expected to reduce the time and space complexity, and consequently improve the performance of data mining algorithms for tasks such as classification. This study presents an empirical study of the effect of particle swarm optimization as a feature selection technique on the performance of classification algorithms. Two dataset from different domains were used: SMS spam detection and sentiment analysis datasets. Particle swarm optimization is applied on the datasets for feature selection. Both the reduced and raw dataset are separately classified using C4.5 decision tree, k-nearest neighbour and support vector machine. The result of the analysis showed that the improvement of classifier performance is case-dependent; some significant improvements are noticed in the sentiment analysis datasets and not in the SMS spam dataset. Although some marginal effect are observ...
International Journal of Electrical and Computer Engineering (IJECE), 2021
Machine learning has been expansively examined with data classification as the most popularly researched subject. The accurateness of prediction is impacted by the data provided to the classification algorithm. Meanwhile, utilizing a large amount of data may incur costs especially in data collection and preprocessing. Studies on feature selection were mainly to establish techniques that can decrease the number of utilized features (attributes) in classification, also using data that generate accurate prediction is important. Hence, a particle swarm optimization (PSO) algorithm is suggested in the current article for selecting the ideal set of features. PSO algorithm showed to be superior in different domains in exploring the search space and local search algorithms are good in exploiting the search regions. Thus, we propose the hybridized PSO algorithm with an adaptive local search technique which works based on the current PSO search state and used for accepting the candidate solution. Having this combination balances the local intensification as well as the global diversification of the searching process. Hence, the suggested algorithm surpasses the original PSO algorithm and other comparable approaches, in terms of performance.
Feature Selection on Classification of Medical Datasets based on Particle Swarm Optimization
International Journal of Computer Applications, 2014
Classification analysis is widely adopted for healthcare applications to support medical diagnostic decisions, improving quality of patient care, etc. A subset dataset of the extensive amounts of data stored in medical databases is selected for training. If the training dataset contains irrelevant features, classification analysis may produce less accurate and less understandable results. Feature subset selection is one of data preprocessing step, which is of immense importance in the field of data mining. This paper proposes the filter and wrapper approaches with Particle Swarm Optimization (PSO) as a feature selection methods for medical data. The performance of the proposed methods is compared with another feature selection algorithm based on Genetic approach. The two algorithms are applied to three medical data sets The results show that the feature subset recognized by the proposed PSO when given as input to five classifiers, namely decision tree, Naïve Bayes, Bayesian, Radial basis function and k-nearest neighbor classifiers showed enhanced classification accuracy over all given types of classification methods.
Feature selection using PSO-SVM
The feature selection process can be considered a problem of global combinatorial optimization in machine learning, which reduces the number of features, removes irrelevant, noisy and redundant data, and results in an acceptable classification accuracy. Feature selection is of great importance in pattern classification, medical data processing, machine learning, and data mining applications. Therefore, a good feature selection method based on the number of features investigated for sample classification is needed in order to speed up the processing rate, predictive accuracy, and to avoid incomprehensibility. In this paper, particle swarm optimization (PSO) is used to implement a feature selection, and support vector machines (SVMs) with the one-versus-rest method serve as a fitness function of PSO for the classification problem. The proposed method is applied to five classification problems from the literature. Experimental results show that our method simplifies features effectivel...
European Journal of Operational Research, 2010
This paper investigates the feature subset selection problem for the binary classification problem using logistic regression model. We developed a modified discrete particle swarm optimization (PSO) algorithm for the feature subset selection problem. This approach embodies an adaptive feature selection procedure which dynamically accounts for the relevance and dependence of the features included the feature subset. We compare the proposed methodology with the tabu search and scatter search algorithms using publicly available datasets. The results show that the proposed discrete PSO algorithm is competitive in terms of both classification accuracy and computational performance.
Improved PSO for Feature Selection on High-Dimensional Datasets
Classification on high-dimensional (i.e. thousands of dimensions) data typically requires feature selection (FS) as a pre-processing step to reduce the dimensionality. However, FS is a challenging task even on datasets with hundreds of features. This paper proposes a new particle swarm optimisation (PSO) based FS approach to classification problems with thousands or tens of thousands of features. The proposed algorithm is examined and compared with three other PSO based methods on five high-dimensional problems of varying difficulty. The results show that the proposed algorithm can successfully select a much smaller number of features and significantly increase the classification accuracy over using all features. The proposed algorithm outperforms the other three PSO methods in terms of both the classification performance and the number of features. Meanwhile, the proposed algorithm is computationally more efficient than the other three PSO methods because it selects a smaller number of features and employs a new fitness evaluation strategy.
Variable-Length Particle Swarm Optimization for Feature Selection on High-Dimensional Classification
IEEE Transactions on Evolutionary Computation, 2018
With a global search mechanism, particle swarm optimization (PSO) has shown promise in feature selection (FS). However, most of the current PSO-based FS methods use a fix-length representation, which is inflexible and limits the performance of PSO for FS. When applying these methods to high-dimensional data, it not only consumes a significant amount of memory but also requires a high computational cost. Overcoming this limitation enables PSO to work on data with much higher dimensionality which has become more and more popular with the advance of data collection technologies. In this paper, we propose the first variable-length PSO representation for FS, enabling particles to have different and shorter lengths, which defines smaller search space and therefore, improves the performance of PSO. By rearranging features in a descending order of their relevance, we facilitate particles with shorter lengths to achieve better classification performance. Furthermore, using the proposed length changing mechanism, PSO can jump out of local optima, further narrow the search space and focus its search on smaller and more fruitful area. These strategies enable PSO to reach better solutions in a shorter time. Results on ten high-dimensional datasets with varying difficulties show that the proposed variable-length PSO can achieve much smaller feature subsets with significantly higher classification performance in much shorter time than the fixed-length PSO methods. The proposed method also outperformed the compared non-PSO FS methods in most cases.
Applications of Evolutionary Computation, 2016
Feature selection and discretisation have shown their effectiveness for data preprocessing especially for high-dimensional data with many irrelevant features. While feature selection selects only relevant features, feature discretisation finds a discrete representation of data that contains enough information but ignoring some minor fluctuation. These techniques are usually applied in two stages, discretisation and then selection since many feature selection methods work only on discrete features. Most commonly used discretisation methods are univariate in which each feature is discretised independently; therefore, the feature selection stage may not work efficiently since information showing feature interaction is not considered in the discretisation process. In this study, we propose a new method called PSO-DFS using bare-bone particle swarm optimisation (BBPSO) for discretisation and feature selection in a single stage. The results on ten high-dimensional datasets show that PSO-DFS obtains a substantial dimensionality reduction for all datasets. The classification performance is significantly improved or at least maintained on nine out of ten datasets by using the transformed "small" data obtained from PSO-DFS. Compared to applying the two-stage approach which uses PSO for feature selection on the discretised data, PSO-DFS achieves better performance on six datasets, and similar performance on three datasets with a much smaller number of features selected.
Self-Adaptive Particle Swarm Optimization for Large-Scale Feature Selection in Classification
ACM Transactions on Knowledge Discovery from Data, 2019
Many evolutionary computation (EC) methods have been used to solve feature selection problems and they perform well on most small-scale feature selection problems. However, as the dimensionality of feature selection problems increases, the solution space increases exponentially. Meanwhile, there are more irrelevant features than relevant features in datasets, which leads to many local optima in the huge solution space. Therefore , the existing EC methods still suffer from the problem of stagnation in local optima on large-scale feature selection problems. Furthermore, large-scale feature selection problems with different datasets may have different properties. Thus, it may be of low performance to solve different large-scale feature selection problems with an existing EC method that has only one candidate solution generation strategy (CSGS). In addition, it is time-consuming to find a suitable EC method and corresponding suitable parameter values for a given large-scale feature selection problem if we want to solve it effectively and efficiently. In this article, we propose a self-adaptive particle swarm optimization (SaPSO) algorithm for feature selection, particularly for large-scale feature selection. First, an encoding scheme for the feature selection problem is employed in the SaPSO. Second, three important issues related to self-adaptive algorithms are investigated. After that, the SaPSO algorithm with a typical self-adaptive mechanism is proposed. The experimental results on 12 datasets show that the solution size obtained by the SaPSO algorithm is smaller than its EC counterparts on all datasets. The SaPSO algorithm performs better than its non-EC and EC counterparts in terms of classification accuracy not only on most training sets but also on most test sets. Furthermore, as the dimensionality of the feature selection problem increases, the advantages of SaPSO become more prominent. This highlights that the SaPSO algorithm is suitable for solving feature selection problems, particularly large-scale feature selection problems.
Feature Selection Using Combined Particle Swarm Optimization and Artificial Neural Network Approach
Journal of Mechatronics, Automation and Identification Technology, 2019
This paper deals with identification of the most influencing input attributes related to the accuracy of the prediction model. It is assumed that the prediction model may be represented by any machine learning-based models, including artificial neural networks, fuzzy models, etc. Selection of influencing attributes is based on particle swarm optimization (PSO) combined with neural networks. The role of neural networks is to estimate fitness of each particle during the search procedure implemented using a PSO algorithm. The presented feature selection method represents the first step in the prediction model design. The method is applied on a dataset characterized with weak correlation between the model's inputs and outputs. Selection of appropriate inputs improves the prediction model accuracy.