AGRID: An Efficient Algorithm for Clustering Large High-Dimensional Datasets (original) (raw)

Abstract

The clustering algorithm GDILC relies on density-based clustering with grid and is designed to discover clusters of arbitrary shapes and eliminate noises. However, it is not scalable to large high-dimensional datasets. In this paper, we improved this algorithm in five important directions. Through these improvements, AGRID is of high scalability and can process large high-dimensional datasets. It can discover clusters of various shapes and eliminate noises effectively. Besides, it is insensitive to the order of input and is a nonparametric algorithm. The high speed and accuracy of the AGRID clustering algorithm was shown in our experiments.

The participation of the conference is supported by NOKIA.

Preview

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. M. Ankerst, M. Breunig, et al, “OPTICS: Ordering points to identify the clustering structure”, In Proc. 1999 ACM-SIGMOD Int. Conf. Management of Data (SIGMOD’99), pp. 49–60, Philadelphia, PA, June 1999.
    Google Scholar
  2. R. Agrawal, J. Gehrke, et al, “Automatic subspace clustering of high dimensional data for data mining aplications”, In Proc. 1998 ACM-SIGMOD Int. Conf. Management of Data (SIGMOD’98), pp. 94–105, Seattle, WA, June 1998.
    Google Scholar
  3. K. Alsabti, S. Ranka, V. Singh, “An Efficient K-Means Clustering Algorithm,” Proc. the First Workshop on High Performance Data Mining, Orlando, Florida, 1998.
    Google Scholar
  4. P.S. Bradley, O.L. Mangasarian, “K-Plane Clustering,” Journal of Global Optimization 16, Number 1, 2000, pp. 23–32.
    Article MATH MathSciNet Google Scholar
  5. M. Ester, H.-P. Kriegel et al, “A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise,” In Proc. 1996 Int. Conf. On Knowledge Discovery and Data Mining (KDD’96), pp. 226–231, Portland, Oregon, Aug. 1996.
    Google Scholar
  6. S. Guha, R. Rastogi, and K. Shim, “Cure: An efficient clustering algorithm for large databases”, In Proc. 1998 ACM-SIGMOD Int. Conf. Management of Data (SIGMOD’98), pp. 73–84, Seattle, WA, June 1998.
    Google Scholar
  7. S. Guha, R. Rastogi, and K. Shim, “Rock: A robust clustering algorithm for categorical attributes”, In Proc. 1999 Int. Conf. Data Engineering (ICDE’99), pp. 512–521, Sydney, Australia, Mar. 1999.
    Google Scholar
  8. A. Hinneburg and D.A. Keim, “An efficient approach to clustering in large multimedia databases with noise”, In Proc. 1998 Int. Conf. Knowledge Discovery and Data Mining (KDD’98), pp. 58–65, New York, Aug. 1998.
    Google Scholar
  9. Jiawei Han, Micheline Kamber, “Data Mining: Concepts and Techniques,” Higher Education Press, Morgan Kaufmann Publishers, 2001.
    Google Scholar
  10. Z. Huang, “Extensions to the k-means algorithm for clustering large data sets with categorical values”, Data Mining and Knowledge Discovery, 2:283–304, 1998.
    Article Google Scholar
  11. Zhexue Huang, “A Fast Clustering Algorithm to Cluster Very Large Categorical Data Sets in Data Mining,” In SIGMOD Workshop on Research Issues on Data Mining and Knowledge Discovery (SIGMOD-DMKD’97), Tucson, Arizona, May 1997.
    Google Scholar
  12. G. Karypis, E.-H. Han, and V. Kumar, “CHAMELEON: A hierarchical clustering algorithm using dynamic modeling”, IEEE Computer, Special Issue on Data Analysis and Mining, Vol. 32, No. 8, August 1999, pp. 68–75.
    Google Scholar
  13. R. Ng and J. Han, “Efficient and effective clustering method for spatial data mining”, In Proc. 1994 Int. Conf. Very Large Data Bases (VLDB’94), pp. 144–155, Santiago, Chile, Sept. 1994.
    Google Scholar
  14. G. Sheikholeslami, S. Chatterjee, and A. Zhang, “WaveCluster: A multi-resolution clustering approach for very large spatial databases”, In Proc. 1998 Int. Conf. Very Large Data Bases (VLDB’98), pp. 428–429, New York, Aug. 1998.
    Google Scholar
  15. Jörg Sander, Martin Ester, Hans-Peter Kriegel, Xiaowei Xu: “Density-Based Clustering in Spatial Databases: The Algorithm GDBSCAN and Its Applications”, Data Mining and Knowledge Discovery, Vol. 2, No 2, June 1998.
    Google Scholar
  16. W. Wang, J. Yang, and R. Muntz, “STING: A statistical information grid approach to spatial data mining”, In Proc. 1997 Int. Conf. Very Large Data Bases (VLDB’97), pp. 186–195, Athens, Greece, Aug. 1997.
    Google Scholar
  17. T. Zhang, R. Ramakrishnan, and M. Livny, “BIRCH: An efficient data clustering method for very large databases”, In Proc. 1996 ACM-SIGMOD Int. Conf. Management of Data (SIGMOD’96), pp. 103–114, Montreal, Canada, June 1996.
    Google Scholar
  18. Zhao Yanchang, Song Junde, “GDILC: A Grid-based Density Iso-line Clustering Algorithm,” Proc. Int. Conf. Info-tech and Info-net (ICII 2001), Beijing, China, Oct. 2001.
    Google Scholar

Download references

Author information

Authors and Affiliations

  1. Electronic Engineering School, Beijing University of Posts and Telecommunications, Beijing, 100876, China
    Zhao Yanchang & Song Junde

Authors

  1. Zhao Yanchang
  2. Song Junde

Editor information

Editors and Affiliations

  1. Computer Science Department, Korea Advanced Institute of Science and Technology, 373-1 Koo-Sung Dong, Yoo-Sung Ku, Daejeon, 305-701, Korea
    Kyu-Young Whang
  2. Department of Statistics, Seoul National University, Sillimdong Kwanakgu, Seoul, 151-742, Korea
    Jongwoo Jeon
  3. School of Electrical Engineering and Computer Science, Seoul National University, Kwanak P.O. Box 34, Seoul, 151-742, Korea
    Kyuseok Shim
  4. Department of Computer Science and Engineering, University of Minnesota, 200 Union St SE, Minneapolis, MN, 55455, USA
    Jaideep Srivastava

Rights and permissions

© 2003 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Yanchang, Z., Junde, S. (2003). AGRID: An Efficient Algorithm for Clustering Large High-Dimensional Datasets. In: Whang, KY., Jeon, J., Shim, K., Srivastava, J. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2003. Lecture Notes in Computer Science(), vol 2637. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-36175-8\_27

Download citation

Keywords

Publish with us