AGRID: An Efficient Algorithm for Clustering Large High-Dimensional Datasets (original) (raw)

Abstract

The clustering algorithm GDILC relies on density-based clustering with grid and is designed to discover clusters of arbitrary shapes and eliminate noises. However, it is not scalable to large high-dimensional datasets. In this paper, we improved this algorithm in five important directions. Through these improvements, AGRID is of high scalability and can process large high-dimensional datasets. It can discover clusters of various shapes and eliminate noises effectively. Besides, it is insensitive to the order of input and is a nonparametric algorithm. The high speed and accuracy of the AGRID clustering algorithm was shown in our experiments.

The participation of the conference is supported by NOKIA.

Preview

Unable to display preview. Download preview PDF.

References

M. Ankerst, M. Breunig, et al, “OPTICS: Ordering points to identify the clustering structure”, In Proc. 1999 ACM-SIGMOD Int. Conf. Management of Data (SIGMOD’99), pp. 49–60, Philadelphia, PA, June 1999.
Google Scholar
R. Agrawal, J. Gehrke, et al, “Automatic subspace clustering of high dimensional data for data mining aplications”, In Proc. 1998 ACM-SIGMOD Int. Conf. Management of Data (SIGMOD’98), pp. 94–105, Seattle, WA, June 1998.
Google Scholar
K. Alsabti, S. Ranka, V. Singh, “An Efficient K-Means Clustering Algorithm,” Proc. the First Workshop on High Performance Data Mining, Orlando, Florida, 1998.
Google Scholar
P.S. Bradley, O.L. Mangasarian, “K-Plane Clustering,” Journal of Global Optimization 16, Number 1, 2000, pp. 23–32.
Article MATH MathSciNet Google Scholar
M. Ester, H.-P. Kriegel et al, “A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise,” In Proc. 1996 Int. Conf. On Knowledge Discovery and Data Mining (KDD’96), pp. 226–231, Portland, Oregon, Aug. 1996.
Google Scholar
S. Guha, R. Rastogi, and K. Shim, “Cure: An efficient clustering algorithm for large databases”, In Proc. 1998 ACM-SIGMOD Int. Conf. Management of Data (SIGMOD’98), pp. 73–84, Seattle, WA, June 1998.
Google Scholar
S. Guha, R. Rastogi, and K. Shim, “Rock: A robust clustering algorithm for categorical attributes”, In Proc. 1999 Int. Conf. Data Engineering (ICDE’99), pp. 512–521, Sydney, Australia, Mar. 1999.
Google Scholar
A. Hinneburg and D.A. Keim, “An efficient approach to clustering in large multimedia databases with noise”, In Proc. 1998 Int. Conf. Knowledge Discovery and Data Mining (KDD’98), pp. 58–65, New York, Aug. 1998.
Google Scholar
Jiawei Han, Micheline Kamber, “Data Mining: Concepts and Techniques,” Higher Education Press, Morgan Kaufmann Publishers, 2001.
Google Scholar
Z. Huang, “Extensions to the k-means algorithm for clustering large data sets with categorical values”, Data Mining and Knowledge Discovery, 2:283–304, 1998.
Article Google Scholar
Zhexue Huang, “A Fast Clustering Algorithm to Cluster Very Large Categorical Data Sets in Data Mining,” In SIGMOD Workshop on Research Issues on Data Mining and Knowledge Discovery (SIGMOD-DMKD’97), Tucson, Arizona, May 1997.
Google Scholar
G. Karypis, E.-H. Han, and V. Kumar, “CHAMELEON: A hierarchical clustering algorithm using dynamic modeling”, IEEE Computer, Special Issue on Data Analysis and Mining, Vol. 32, No. 8, August 1999, pp. 68–75.
Google Scholar
R. Ng and J. Han, “Efficient and effective clustering method for spatial data mining”, In Proc. 1994 Int. Conf. Very Large Data Bases (VLDB’94), pp. 144–155, Santiago, Chile, Sept. 1994.
Google Scholar
G. Sheikholeslami, S. Chatterjee, and A. Zhang, “WaveCluster: A multi-resolution clustering approach for very large spatial databases”, In Proc. 1998 Int. Conf. Very Large Data Bases (VLDB’98), pp. 428–429, New York, Aug. 1998.
Google Scholar
Jörg Sander, Martin Ester, Hans-Peter Kriegel, Xiaowei Xu: “Density-Based Clustering in Spatial Databases: The Algorithm GDBSCAN and Its Applications”, Data Mining and Knowledge Discovery, Vol. 2, No 2, June 1998.
Google Scholar
W. Wang, J. Yang, and R. Muntz, “STING: A statistical information grid approach to spatial data mining”, In Proc. 1997 Int. Conf. Very Large Data Bases (VLDB’97), pp. 186–195, Athens, Greece, Aug. 1997.
Google Scholar
T. Zhang, R. Ramakrishnan, and M. Livny, “BIRCH: An efficient data clustering method for very large databases”, In Proc. 1996 ACM-SIGMOD Int. Conf. Management of Data (SIGMOD’96), pp. 103–114, Montreal, Canada, June 1996.
Google Scholar
Zhao Yanchang, Song Junde, “GDILC: A Grid-based Density Iso-line Clustering Algorithm,” Proc. Int. Conf. Info-tech and Info-net (ICII 2001), Beijing, China, Oct. 2001.
Google Scholar

Download references

Author information

Authors and Affiliations

Electronic Engineering School, Beijing University of Posts and Telecommunications, Beijing, 100876, China
Zhao Yanchang & Song Junde

Authors

Zhao Yanchang
Song Junde

Editor information

Editors and Affiliations

Computer Science Department, Korea Advanced Institute of Science and Technology, 373-1 Koo-Sung Dong, Yoo-Sung Ku, Daejeon, 305-701, Korea
Kyu-Young Whang
Department of Statistics, Seoul National University, Sillimdong Kwanakgu, Seoul, 151-742, Korea
Jongwoo Jeon
School of Electrical Engineering and Computer Science, Seoul National University, Kwanak P.O. Box 34, Seoul, 151-742, Korea
Kyuseok Shim
Department of Computer Science and Engineering, University of Minnesota, 200 Union St SE, Minneapolis, MN, 55455, USA
Jaideep Srivastava

Rights and permissions

Copyright information

About this paper

Cite this paper

Yanchang, Z., Junde, S. (2003). AGRID: An Efficient Algorithm for Clustering Large High-Dimensional Datasets. In: Whang, KY., Jeon, J., Shim, K., Srivastava, J. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2003. Lecture Notes in Computer Science(), vol 2637. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-36175-8\_27

Download citation

.RIS
.ENW
.BIB
DOI: https://doi.org/10.1007/3-540-36175-8\_27
Published: 30 April 2003
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-04760-5
Online ISBN: 978-3-540-36175-6
eBook Packages: Springer Book Archive