AGRID: An Efficient Algorithm for Clustering Large High-Dimensional Datasets (original) (raw)
Abstract
The clustering algorithm GDILC relies on density-based clustering with grid and is designed to discover clusters of arbitrary shapes and eliminate noises. However, it is not scalable to large high-dimensional datasets. In this paper, we improved this algorithm in five important directions. Through these improvements, AGRID is of high scalability and can process large high-dimensional datasets. It can discover clusters of various shapes and eliminate noises effectively. Besides, it is insensitive to the order of input and is a nonparametric algorithm. The high speed and accuracy of the AGRID clustering algorithm was shown in our experiments.
The participation of the conference is supported by NOKIA.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
- M. Ankerst, M. Breunig, et al, “OPTICS: Ordering points to identify the clustering structure”, In Proc. 1999 ACM-SIGMOD Int. Conf. Management of Data (SIGMOD’99), pp. 49–60, Philadelphia, PA, June 1999.
Google Scholar - R. Agrawal, J. Gehrke, et al, “Automatic subspace clustering of high dimensional data for data mining aplications”, In Proc. 1998 ACM-SIGMOD Int. Conf. Management of Data (SIGMOD’98), pp. 94–105, Seattle, WA, June 1998.
Google Scholar - K. Alsabti, S. Ranka, V. Singh, “An Efficient K-Means Clustering Algorithm,” Proc. the First Workshop on High Performance Data Mining, Orlando, Florida, 1998.
Google Scholar - P.S. Bradley, O.L. Mangasarian, “K-Plane Clustering,” Journal of Global Optimization 16, Number 1, 2000, pp. 23–32.
Article MATH MathSciNet Google Scholar - M. Ester, H.-P. Kriegel et al, “A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise,” In Proc. 1996 Int. Conf. On Knowledge Discovery and Data Mining (KDD’96), pp. 226–231, Portland, Oregon, Aug. 1996.
Google Scholar - S. Guha, R. Rastogi, and K. Shim, “Cure: An efficient clustering algorithm for large databases”, In Proc. 1998 ACM-SIGMOD Int. Conf. Management of Data (SIGMOD’98), pp. 73–84, Seattle, WA, June 1998.
Google Scholar - S. Guha, R. Rastogi, and K. Shim, “Rock: A robust clustering algorithm for categorical attributes”, In Proc. 1999 Int. Conf. Data Engineering (ICDE’99), pp. 512–521, Sydney, Australia, Mar. 1999.
Google Scholar - A. Hinneburg and D.A. Keim, “An efficient approach to clustering in large multimedia databases with noise”, In Proc. 1998 Int. Conf. Knowledge Discovery and Data Mining (KDD’98), pp. 58–65, New York, Aug. 1998.
Google Scholar - Jiawei Han, Micheline Kamber, “Data Mining: Concepts and Techniques,” Higher Education Press, Morgan Kaufmann Publishers, 2001.
Google Scholar - Z. Huang, “Extensions to the k-means algorithm for clustering large data sets with categorical values”, Data Mining and Knowledge Discovery, 2:283–304, 1998.
Article Google Scholar - Zhexue Huang, “A Fast Clustering Algorithm to Cluster Very Large Categorical Data Sets in Data Mining,” In SIGMOD Workshop on Research Issues on Data Mining and Knowledge Discovery (SIGMOD-DMKD’97), Tucson, Arizona, May 1997.
Google Scholar - G. Karypis, E.-H. Han, and V. Kumar, “CHAMELEON: A hierarchical clustering algorithm using dynamic modeling”, IEEE Computer, Special Issue on Data Analysis and Mining, Vol. 32, No. 8, August 1999, pp. 68–75.
Google Scholar - R. Ng and J. Han, “Efficient and effective clustering method for spatial data mining”, In Proc. 1994 Int. Conf. Very Large Data Bases (VLDB’94), pp. 144–155, Santiago, Chile, Sept. 1994.
Google Scholar - G. Sheikholeslami, S. Chatterjee, and A. Zhang, “WaveCluster: A multi-resolution clustering approach for very large spatial databases”, In Proc. 1998 Int. Conf. Very Large Data Bases (VLDB’98), pp. 428–429, New York, Aug. 1998.
Google Scholar - Jörg Sander, Martin Ester, Hans-Peter Kriegel, Xiaowei Xu: “Density-Based Clustering in Spatial Databases: The Algorithm GDBSCAN and Its Applications”, Data Mining and Knowledge Discovery, Vol. 2, No 2, June 1998.
Google Scholar - W. Wang, J. Yang, and R. Muntz, “STING: A statistical information grid approach to spatial data mining”, In Proc. 1997 Int. Conf. Very Large Data Bases (VLDB’97), pp. 186–195, Athens, Greece, Aug. 1997.
Google Scholar - T. Zhang, R. Ramakrishnan, and M. Livny, “BIRCH: An efficient data clustering method for very large databases”, In Proc. 1996 ACM-SIGMOD Int. Conf. Management of Data (SIGMOD’96), pp. 103–114, Montreal, Canada, June 1996.
Google Scholar - Zhao Yanchang, Song Junde, “GDILC: A Grid-based Density Iso-line Clustering Algorithm,” Proc. Int. Conf. Info-tech and Info-net (ICII 2001), Beijing, China, Oct. 2001.
Google Scholar
Author information
Authors and Affiliations
- Electronic Engineering School, Beijing University of Posts and Telecommunications, Beijing, 100876, China
Zhao Yanchang & Song Junde
Authors
- Zhao Yanchang
- Song Junde
Editor information
Editors and Affiliations
- Computer Science Department, Korea Advanced Institute of Science and Technology, 373-1 Koo-Sung Dong, Yoo-Sung Ku, Daejeon, 305-701, Korea
Kyu-Young Whang - Department of Statistics, Seoul National University, Sillimdong Kwanakgu, Seoul, 151-742, Korea
Jongwoo Jeon - School of Electrical Engineering and Computer Science, Seoul National University, Kwanak P.O. Box 34, Seoul, 151-742, Korea
Kyuseok Shim - Department of Computer Science and Engineering, University of Minnesota, 200 Union St SE, Minneapolis, MN, 55455, USA
Jaideep Srivastava
Rights and permissions
Copyright information
© 2003 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Yanchang, Z., Junde, S. (2003). AGRID: An Efficient Algorithm for Clustering Large High-Dimensional Datasets. In: Whang, KY., Jeon, J., Shim, K., Srivastava, J. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2003. Lecture Notes in Computer Science(), vol 2637. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-36175-8\_27
Download citation
- .RIS
- .ENW
- .BIB
- DOI: https://doi.org/10.1007/3-540-36175-8\_27
- Published: 30 April 2003
- Publisher Name: Springer, Berlin, Heidelberg
- Print ISBN: 978-3-540-04760-5
- Online ISBN: 978-3-540-36175-6
- eBook Packages: Springer Book Archive