An efficient hybrid clustering algorithm for segmentation: Autocluster (original) (raw)

2017, International Journal of Data Science

Abstract

A new automatic clustering algorithm has been proposed in this paper, which does not need clustering information, such as number of clusters and radius of density. Moreover, this algorithm generates robust results, and named Autocluster. Autocluster is a suitable algorithm for customer segmentation, and as it is known, clustering information is not available properly in customer segmentation. Autocluster applies concepts of partitioning clustering algorithms, hierarchical clustering algorithms and density-based clustering algorithm. Consequently, a new, automatic and high-precision algorithm has been proposed. Autocluster consists of four steps: developing 'distance matrix', identifying 'best point (data record)', developing 'point matrix' and 'clustering'. These steps have been explained comprehensively in this paper. Furthermore, iris database and a synthetic dataset has been analysed by Autocluster to verify its capabilities vs. K-means algorithm. Moreover, an Iranian insurance dataset has been clustered by Autocluster, which has shown satisfying results, compared to the results from K-means.

Loading...

Loading Preview

Sorry, preview is currently unavailable. You can download the paper by clicking the button above.

References (26)

  1. to this paper should be made as follows: Khakbaz, S.B., Pourestarabadi, M. and Hajiheydari, N. (2017) 'An efficient hybrid clustering algorithm for segmentation: Autocluster', Int. J. Data Science, Vol. 2, No. 3, pp.205-220.
  2. Aloise, D. and Hansen, P. (2011) 'Evaluating a branch-and-bound RLT-based algorithm for minimum sum-of-squares clustering', Journal of Global Optimization, pp.449-465.
  3. Chaoji, V., Hasan, M.A., Salem, S. and Zaki, M.J. (2009) 'SPARCL: an effective and efficient algorithm for mining arbitrary shape-based clusters', Knowledge and Information Systems, pp.201-229.
  4. Duan, L., Xu, L., Guo, F., Lee, J. and Yan, B. (2007) 'A local-density based spatial clustering algorithm with noise', Information Systems, pp.978-986.
  5. Gupta, N., Saxena, P. and Gupta, J. (2011) 'Automatic generation of initial value k to apply k-means method for text documents clustering', International Journal of Data Mining, Modelling and Management, Vol. 3, No. 1, pp.18-41.
  6. Han, J., Kamber, M. and Pei, J. (2012) Data Mining Concepts and Techniques, Morgan Kaufmann, Waltham.
  7. Han, J., Kamber, M. and Tung, A. K. (2001) 'Spatial clustering methods in data mining: a survey', Geographic Data Mining and Knowledge Discovery, pp.33-50.
  8. Jain, A. and Dubes, R. (1998) Algorithms for Clustering Data, Prentice Hall, New Jersey.
  9. Jiang, H., Yi, S., Li, J., Yang, F. and Hu, X. (2010) 'Ant clustering algorithm with K-harmonic means clustering', Expert Systems with Applications, pp.8679-8684.
  10. Jing, L., Li, J., Ng, M.K., Cheung, Y-m. and Huang, J. (2009) 'SMART: a subspace clustering algorithm that automatically identifies the appropriate number of clusters', International Journal of Data Mining, Modelling and Management, Vol. 1, No. 2, pp.149-171.
  11. Koga, H., Ishibashi, T. and Watanabe, T. (2007) 'Fast agglomerative hierarchical clustering algorithm using Locality-Sensitive Hashing', Knowledge and Information Systems, Vol. 12, No. 1, pp.25-53.
  12. Laszlo, M. and Mukherjee, S. (2007) 'A genetic algorithm that exchanges neighboring centers for k-means clustering', Pattern Recognition Letters, Vol. 28, No. 16, pp.2359-2366.
  13. Lee, S-S. and Lin, J-C. (2012) 'An accelerated K-means clustering algorithm using selection and erasure rules', Journal of Zhejiang University-SCIENCE C (Computers & Electronics), Vol. 13, No. 10, pp.761-768.
  14. Liu, Y., Wu, X. and Shen, Y. (2011) 'Automatic clustering using genetic algorithms', Applied Mathematics and Computation, Vol. 218, No. 4, pp.1267-1279.
  15. MacQueen, J.B. (1967) 'Some methods for classification and analysis of multivariate observations', 5th Berkeley Symposium on Mathematical Statistics and Probability, Los Angeles, pp.281-297.
  16. Maroosi, A. and Amiri, B. (2010) 'A new clustering algorithm based on hybrid global optimization based on a dynamical systems approach algorithm', Expert Systems with Applications, Vol. 37, No. 8, pp.5645-5652.
  17. Mennis, J. and Guo, D. (2009) 'Spatial data mining and geographic knowledge discovery -an introduction', Computers, Environment and Urban Systems, Vol. 33, No. 6, pp.403-408.
  18. Niknam, T., Fard, E.T., Pourjafarian, N. and Rousta, A. (2011) 'An efficient hybrid algorithm based on modified imperialist competitive algorithm and K-means for data clustering', Engineering Applications of Artificial Intelligence, Vol. 24, No. 2, pp.306-317.
  19. Park, H-S. and Jun, C-H. (2009) 'A simple and fast algorithm for K-medoids clustering', Expert Systems with Applications, Vol. 36, No. 2, pp.3336-3341.
  20. Qiao, S., Li, T., Li, H., Peng, J. and Chen, H. (2012) 'A new blockmodeling based hierarchical clustering algorithm for web social networks', Engineering Applications of Artificial Intelligence, Vol. 25, No. 3, pp.640-647.
  21. Shohdohji, T., Yano, F. and Toyoda, Y. (2010) 'A new algorithm based on metaheuristics for data clustering*', Journal of Zhejiang University-SCIENCE A (Applied Physics & Engineering), Vol. 11, No. 12, pp.921-926.
  22. Sidorova, V.S. (2011) 'Automatic hierarchical clustering algorithm for remote sensing data', Pattern Recognition and Image Analysis, Vol. 21, No. 2, pp.328-331.
  23. Wang, B., Rahal, I. and Dong, A. (2011) 'Parallel hierarchical clustering using weighted confidence affinity', International Journal of Data Mining, Modelling and Management, Vol. 3, No. 2, pp.110-129.
  24. WeiNing, Q., XueQing, G. and AoYing, Z. (2003) 'Clustering in very large databases based on distance and density', Journal of Computer Science & Technology, Vol. 18, No. 1, pp.67-76.
  25. Wu, X., Kumar, V., Quinlan, J.R., Ghosh, J., Yang, Q., Motoda, H., McLachlan, G.J., Ng, A., Liu, B., Yu, P.S., Zhou, Z.-H., Steinbach, M., Hand, D.J. and Steinberg, D. (2008) 'Top 10 algorithms in data mining', Knowledge and Information Systems, Vol. 14, No. 1, pp.1-37.
  26. Yujian, L. (2007) 'A clustering algorithm based on maximal θ-distant subtrees', Pattern Recognition, Vol. 40, No. 5, pp.1425-1431.