New Search Strategies and New Derived Inequality for Efficient K-Medoids-Based Algorithms (original) (raw)
Related papers
Efficient search approaches for K-medoids-based algorithms
… '02. Proceedings. 2002 …, 2002
In this paper, the concept of previous medoid index is introduced. The' utilization of memory for efficient medoid search is also presented. We propose a hybrid search approach Cor the problem of nearest neighbor search. The hybrid search approach is to combine tlie previbus medoid index, the utilization of memory, the criterion of triangular inequality elimination add the partial distance search. The proposed hybrid search approach is applied to the k-medoids-based algoritluns. Experimental results based on Gauss-Markov source, curve data set and elliptic clusters demonstrate that the proposed algorithm applied to C L 4 M 5 ' algorithm may reduce the number of distance calculation from 88.4% to 95.2% with the s a n~ average distance per object comparing with CURAVX The proposed hybrid search approach can also he applied to the nearest neighbor searching and the other clustering algorithms.
Improved search strategies and extensions to k-medoids-based clustering algorithms
International Journal of Business Intelligence and Data Mining, 2008
In this paper two categories of improvements are suggested that can be applied to most k-medoids-based algorithms -conceptual / algorithmic improvements, and implementational improvements. These include the revisiting of the accepted cases for swap comparison and the application of partial distance searching and previous medoid indexing to clustering.
Clustering plays a vital role in research area in the field of data mining. Clustering is a process of partitioning a set of data in a meaningful sub classes called clusters. It helps users to understand the natural grouping of cluster from the data set. It is unsupervised classification that means it has no predefined classes. Applications of cluster analysis are Economic Science, Document classification, Pattern Recognition, Image Processing, text mining. Hence, in this study some algorithms are presented which can be used according to one's requirement. K-means is the most popular algorithm used for the purpose of data segmentation. K-means is not very effective in many cases. Also it is not even applicable for data segmentation in some specific kinds of matrices like Absolute Pearson. Whereas K-Medoids is considered flexible than k-means and also carry compatibility to work with almost every type of data matrix. The medoid computed using k-Medoids algorithm is roughly comparable to the median. After checking the literature on median, we have found a number of advantages of median over arithmetic mean. In this paper, we have used a modified version of k-medoids algorithm for the large data sets. Proposed k-medoids algorithm has been modified to perform faster than k-means because speed is the major cause behind the k-medoids unpopularity as compared to kmeans. Our experimental results have shown that improved k-medoid performed better than k-means and k-medoid in terms of cluster quality and elapsed time
New Approach for K-mean and K-medoids Algorithm
International Journal of Computer Applications Technology and Research, 2012
K-means and K-medoids clustering algorithms are widely used for many practical applications. Original k-mean and kmedoids algorithms select initial centroids and medoids randomly that affect the quality of the resulting clusters and sometimes it generates unstable and empty clusters which are meaningless. The original k-means and k-mediods algorithm is computationally expensive and requires time proportional to the product of the number of data items, number of clusters and the number of iterations. The new approach for the k mean algorithm eliminates the deficiency of exiting k mean. It first calculates the initial centroids k as per requirements of users and then gives better, effective and stable cluster. It also takes less execution time because it elimi nates unnecessary distance computation by using previous iteration. The new approach for k-medoids selects initial k medoids systematically based on initial centroids. It generates stable clusters to improve accuracy.
International Journal of Computer Applications
Clustering techniques are application tools to analyze stored data in various fields. Clustering is a process to partition meaningful data into useful clusters which can be understood easily and has analytical value. The K-Means and K-Medoid Algorithms in their existing structure carry certain weaknesses. For example in case of K-Means algorithm "deformation" and "deviations" may arise due to the misbehavior and disruption in the computing process. Similarly in case of K-Medoid Algorithm a lot of iteration is required which consumes huge amount of time and their by reduces the efficiency of clustering. In the present paper, we have proposed a new Modified K-Medoid Algorithm for improving efficiency and scalability for the study of large datasets. The extended K-Medoids Algorithm stand better in terms of execution time, quality of clusters, number of clusters and number of records than the comparative results of K-Means and K-Medoid Algorithm. Extended K-Medoid Algorithm is evaluated using sample real employee datasets and results are compared with K-Means and K-Medoids.
An Efficient Density based Improved K-Medoids Clustering algorithm
database, 2011
Clustering is the process of classifying objects into different groups by partitioning sets of data into a series of subsets called clusters. Clustering has taken its roots from algorithms like k-medoids and k-medoids. However conventional k-medoids clustering algorithm suffers from many limitations. Firstly, it needs to have prior knowledge about the number of cluster parameter k. Secondly, it also initially needs to make random selection of k representative objects and if these initial k medoids are not selected properly then natural cluster may not be obtained. Thirdly, it is also sensitive to the order of input dataset. Mining knowledge from large amounts of spatial data is known as spatial data mining. It becomes a highly demanding field because huge amounts of spatial data have been collected in various applications ranging from geo-spatial data to bio-medical knowledge. The database can be clustered in many ways depending on the clustering algorithm employed, parameter settings used, and other factors. Multiple clustering can be combined so that the final partitioning of data provides better clustering. In this paper, an efficient density based k-medoids clustering algorithm has been proposed to overcome the drawbacks of DBSCAN and kmedoids clustering algorithms. The result will be an improved version of kmedoids clustering algorithm. This algorithm will perform better than DBSCAN while handling clusters of circularly distributed data points and slightly overlapped clusters.
An improved K-medoids clustering approach based on the crow search algorithm
Journal of Computational Mathematics and Data Science
K-medoids clustering algorithm is a simple yet effective algorithm that has been applied to solve many clustering problems. Instead of using the mean point as the centre of a cluster, K-medoids uses an actual point to represent it. Medoid is the most centrally located object of the cluster, with a minimum sum of distances to other points. K-medoids can correctly represent the cluster centre as it is robust to outliers. However, the K-medoids algorithm is unsuitable for clustering arbitrary shaped groups of objects and large scale datasets. This is because it uses compactness as a clustering criterion instead of connectivity. An improved k-medoids algorithm based on the crow search algorithm is proposed to overcome the above problems. This research uses the crow search algorithm to improve the balance between the exploration and exploitation process of the K-medoids algorithm. Experimental result comparison shows that the proposed improved algorithm performs better than other competitors.
Approximate Shortest Distance Computing Using k-Medoids Clustering
Annals of Data Science, 2017
Shortest distance query is widely used aspect in large scale networks. Numerous approaches are present in the literature to approximate the distance between two query nodes. Most popular distance approximation approach is landmark embedding scheme. In this technique selection of optimal landmarks is a NP-hard problem. Various heuristics available to locate optimal landmarks include random, degree, closeness centrality, betweenness and eccentricity etc. In this paper, we propose to employ k-medoids clustering based approach to improve distance estimation accuracy over local landmark embedding techniques. In particular, it is observed that global selection of the seed landmarks causes' large relative error, which is further reduced using local landmark embedding. The efficacy of the proposed approach is analyzed with respect to conventional graph embedding techniques on six large-scale networks. Results express that the proposed landmark selection scheme reduces the shortest distance estimation error considerably. Proposed technique is able to reduce the approximation error of shortest distance by upto 29% with respect to the other graph embedding technique.
Comparative Analysis between K-Means and K-Medoids for Statistical Clustering
2015 3rd International Conference on Artificial Intelligence, Modelling and Simulation (AIMS), 2015
Clustering dynamic data is a challenge in identifying and forming groups. This unsupervised learning usually leads to undirected knowledge discovery. The cluster detection algorithm searches for clusters of data which are similar to one another by using similarity measures. Determining the suitable algorithm which can bring the optimized groups cluster could be an issue. Depending on the parameters and attributes of the data, the results yielded from using both K-Means and K-Medoids could be varied. This paper presents a comparative analysis of both algorithms in different data clusters to lay out the strengths and weaknesses of both. Thorough studies were conducted in determining the correlation of the data with the algorithms to find the relationship among them.
Efficiency of k-Means and K-Medoids Algorithms for Clustering Arbitrary Data Points
There are number of techniques proposed by several researchers to analyze the performance of clustering algorithms in data mining. All these techniques are not suggesting good results for the chosen data sets and for the algorithms in particular. Some of the clustering algorithms are suit for some kind of input data. This research work uses arbitrarily distributed input data points to evaluate the clustering quality and performance of two of the partition based clustering algorithms namely k- Means and k-Medoids. To evaluate the clustering quality, the distance between two data points are taken for analysis. The computational time is calculated for each algorithm in order to measure the performance of the algorithms. The experimental results show that the k-Means algorithm yields the best results compared with k-Medoids algorithm.