Optimization of K-Mode Algorithm for Data Mining Using Particle Swarm Optimization (original) (raw)
Related papers
Data clustering is an approach for automatically finding classes, concepts, or groups of patterns. It also aims at representing large datasets by a few number of prototypes or clusters. It brings simplicity in modelling data and plays an important role in the process of knowledge discovery and data mining. Data mining tasks require fast and accurate partitioning of huge datasets, which may come with a variety of attributes or features. This imposes computational requirements on the clustering techniques. Swarm Intelligence (SI) has emerged that meets these requirements and has successfully been applied to a number of real world clustering problems. This paper looks into the use of Particle Swarm Optimization for cluster analysis. The effectiveness of Fuzzy C-means clustering provides enhanced performance and maintains more diversity in the swarm and allows the particles to be robust to trace the changing environment. Data structure identifying from the large scale data has become a very important in the data mining problems. Cluster analysis identifies groups of similar data items in large datasets which is one of its recent beneficiaries. The increasing complexity and large amounts of data in the data sets that have seen data clustering emerge as a popular focus for the application of optimization based techniques. Different optimization techniques have been applied to investigate the optimal solution for clustering problems. This paper also proposes two new approaches using PSO to cluster data. It is shown how PSO can be used to find the centroids of a user specified number of clusters.
Improving the Cluster Performance by Combining Pso and K-Means Algorithm
ICTACT Journal on Soft Computing, 2011
Clustering is a technique that can divide data objects into groups based on information found in the data that describes the objects and their relationships. In this paper describe to improving the clustering performance by combine Particle Swarm Optimization (PSO) and Kmeans algorithm. The PSO algorithm successfully converges during the initial stages of a global search, but around global optimum, the search process will become very slow. On the contrary, K-means algorithm can achieve faster convergence to optimum solution. Unlike K-means method, new algorithm does not require a specific number of clusters given before performing the clustering process and it is able to find the local optimal number of clusters during the clustering process. In each iteration process, the inertia weight was changed based on the current iteration and best fitness. The experimental result shows that better performance of new algorithm by using different data sets.
Improve Hybrid Particle Swarm Optimization and K-Means for Clustering
Journal of Information Technology and Computer Science
This research was conducted in Batu city, by classifying land based on land suitability for potato crops. Batu city is a hilly area with a high land slope so that there is a high potential for land degradation. Potato crop production is influenced by climate, suitability of planting land and treatment before harvest. Based on these problems, land mapping is needed so that it is easier for farmers to determine the optimal planting location for potato crops. The land mapping process is carried out using clustering techniques. The clustering process is carried out using 11 land suitability criteria for potato crops including average temperature, first month rainfall, second and third month rainfall, fourth month rainfall, drainage, soil texture, soil depth, Ph H2O, C-Organic, CEC and slope. The clustering results are 4 land suitability classes which are very suitable (S1), suitable (S2), quite suitable (S3) and not suitable (N). The clustering process is carried out using 5 different architectures namely K-Means, Particle swarm optimization (PSO), K-Means PSO, PSO K-Means, and Particle Swarm Optimization and K-Means (KCPSO) hybrids. The fitness value is calculated using the silhouette coefficient calculation. Architectural testing is done to get an architecture that has the highest fitness value. In this study a new approach was used to improve the accuracy of clustering results in the KCPSO architecture using the random injection method. Based on the test results, the KCPSO architecture obtained the biggest fitness values compared to the other fife clustering architectures. Testing the results of clustering is done by comparing the results of the KCPSO method with expert calculations.
A Novel Hybrid Clustering Analysis Based on Combination of K-Means and PSO Algorithm
Data Intelligence and Cognitive Informatics, Algorithms for Intelligent Systems,, 2022
One of the unsupervised grouping data or data mining is Clustering, which is performs the grouping the data on similarity and dissimilarity among them. In making decisions hidden patterns are gives the bases and these hidden patterns can be identified by using the Clustering techniques. A hybrid clustering analysis based on grouping of K-means and PSO algorithm is proposed in this paper for the doing the clustering process. K-means is used widely for clustering process because it is easy and simple to implement. But it cannot give as much as better results so it can be fused with the other clustering method which is Particle Swarm Optimization (PSO). By using primary-world datasets this approach can be solved. Based on the clustering rationality measures the comparison is accessed between the traditional clustering technique and proposed method. Some of clustering factors are sum of squared error (SSE), Quantization-error and Silhouette index. In the comparison the proposed system gives more improvement results than the K-means, PSO technique.
Comparative Study of Particle Swarm Optimization based Unsupervised Clustering Techniques
2009
In order to overcome the shortcomings of traditional clustering algorithms such as local optima and sensitivity to initialization, a new Optimization technique, Particle Swarm Optimization is used in association with Unsupervised Clustering techniques in this paper. This new algorithm uses the capacity of global search in PSO algorithm and solves the problems associated with traditional clustering techniques. This merge avoids the local optima problem and increases the convergence speed. Parameters, time, distance and mean, are used to compare PSO based Fuzzy C-Means, PSO based Gustafson's-Kessel, PSO based Fuzzy K-Means with extragrades and PSO based K-Means are suitably plotted. Thus, Performance evaluation of Particle Swarm Optimization based Clustering techniques is achieved. Results of this PSO based clustering algorithm is used for remote image classification. Finally, accuracy of this image is computed along with its Kappa Coefficient.
Particle Swarm Optimization based K-Prototype Clustering Algorithm
Clustering in data mining is a discovery process that groups a set of data so as to maximize the intra-cluster similarity and to minimize the inter-cluster similarity. The K-Means algorithm is best suited for clustering large numeric data sets when at possess only numeric values. The K-Modes extends to the K-Means when the domain is categorical. But in some applications, data objects are described by both numeric and categorical features. The K-Prototype algorithm is one of the most important algorithms for clustering this type of data. This algorithm produces locally optimal solution that dependent on the initial prototypes and order of object in the data. Particle Swarm Optimization is one of the simple optimization techniques, which can be effectively implemented to enhance the clustering results. But discrete or binary Particle Swarm Optimization mechanisms are useful for handle mixed data set. This leads to a better cost evaluation in the description space and subsequently enhanced processing of mixed data by the Particle Swarm Optimization. This paper proposes a new variant of binary Particle Swarm Optimization and K-Prototype algorithms to reach global optimal solution for clustering optimization problem. The proposed algorithm is implemented and evaluated on standard benchmark dataset taken from UCI machine learning repository. The comparative analysis proved that Particle Swarm based on K-Prototype algorithm provides better performance than the traditional K-modes and K-Prototype algorithms.
The main purpose of data mining is to extract hidden predictive knowledge of useful information and patterns of data from large databases for utilizing it in decision support. Medical field has large amount of various heterogeneous databases, in which the extraction of hidden useful knowledge for the classification of data is difficult one. In order to cluster and classify the whole databases of medical field, a clustering algorithm MPSO-AFKM (Modified Particle Swarm Optimization based Adaptive Fuzzy K-Modes) is introduced. The proposed method works with the two phases clustering and classification for the effective classification of medical database. The foremost step is the clustering, which utilize the MPSO-AFKM algorithm for obtaining clustered data. In MPSO-AFKM, the categorical data is clustered with Adaptive Fuzzy K-Modes (AFKM) algorithm and the cluster centroids in AFKM is optimized using Modified Particle Swarm Optimization (MPSO) algorithm for getting accurate clustering results. The clustered results of data are classified with the aid of Fuzzy Logic system, by which our required information is achieved. Our proposed work is implemented in Matlab platform on Postoperative Patient dataset. And the performance is also evaluated with the evaluation metrics precision, sensitivity, specificity and accuracy, which shows that our proposed work performance is better one for the effective medical data clustering. Moreover, the comparison is also made to prove the good performance of our proposed work over other existing works.
An Improved Particle Swarm Optimization for Data Clustering
2012
In recent years, clustering is still a popular analysis tool for data statistics. The data structure identifying from the large-scale data has become a very important issue in the data mining problem. In this paper, an improved particle swarm optimization based on Gauss chaotic map for clustering is proposed. Gauss chaotic map adopts a random sequence with a random starting point as a parameter, and relies on this parameter to update the positions and velocities of the particles. It provides the significant chaos distribution to balance the exploration and exploitation capability for search process. This easy and fast function generates a random seed processes, and further improve the performance of PSO due to their unpredictability. In the experiments, the eight different clustering algorithms were extensively compared on six test data. The results indicate that the performance of our proposed method is significantly better than the performance of other algorithms for data clusteri...
Scope of Research on Particle Swarm Optimization Based Data Clustering
ArXiv, 2019
Optimization is nothing but a mathematical technique which finds maxima or minima of any function of concern in some realistic region. Different optimization techniques are proposed which are competing for the best solution. Particle Swarm Optimization (PSO) is a new, advanced, and most powerful optimization methodology that performs empirically well on several optimization problems. It is the extensively used Swarm Intelligence (SI) inspired optimization algorithm used for finding the global optimal solution in a multifaceted search region. Data clustering is one of the challenging real world applications that invite the eminent research works in variety of fields. Applicability of different PSO variants to data clustering is studied in the literature, and the analyzed research work shows that, PSO variants give poor results for multidimensional data. This paper describes the different challenges associated with multidimensional data clustering and scope of research on optimizing t...