Algorithms for mining distance-based outliers in large datasets (original) (raw)

1998, Proc. of the 24th lnt

Abstract

Algorithms for Mining Distance-Based Outliers in Large ... Abstract This paper deals with finding outliers (ex- ceptions) in large, multidimensional datasets. ...

Loading Preview

Sorry, preview is currently unavailable. You can download the paper by clicking the button above.

References (26)

A. A rning, R. Agrawal, and P. Raghavan. A linear method for deviation detection in large databases. In Proc. KDD, pages 164- 169, 1996.
R. Agrawal, S. Ghosh, T. Imielinski, B. Iyer, and A. Swami. An interval clas- Table 2: CPU + I/O Times (in Seconds) for a Variable Number of Tuples, Dimensions, and Cells-for p = 0.9999. 3-D 4-D 5-D N CS(10") NL CS(104) NL CS(105) CS(85)
CS(65) NL 100,000 10.77
2,000,000 253.90 2332.10 606.56 1421.16 >2147 >2147 >2147 1555.78 sifier for database mining applications. In PTOC. 18th VLDB, pages 560-573, 1992.
R. Agrawal, T. Imielinski, and A. Swami. Mining association rules between sets of items in large databases. In PTOC. ACM SIGMOD, pages 207-216,1993.
AL881 D. Angluin and P. Laird. Learning from noisy examples. Machine Learning, 2(4):343-370, 1988.
I. S. Bhandari, E. Colet, J. Parker, Z. Pines, R. Pratap, and K. Ramanujam. Advanced scout: Data mining and knowl- edge discovery in NBA data. Data Min- ing and Knowledge Discovery, 1(1):121- 125, 1997.
Ben751 J. L. Bentley. Multidimensional binary search trees used for associative searching. CACM, 18(9):509-517, 1975.
V. Barnett and T. Lewis. Outliers in Sta- tistical Data. John Wiley, 3rd edition, 1994.
M. Ester, H.-P. Kriegel, J. Sander, and X. Xu. A density-based algorithm for dis- covering clusters in large spatial databases with noise. In PTOC. KDD, pages 226-231, 1996.
D. Freedman, R. Pisani, and R. Purves. Statistics. W.W. Norton, New York, 1978.
Gut841 R. Guttmann. A dynamic index structure for spatial searching. In PTOC. ACM SIG- MOD, pages 47-57, 1984.
D. Hawkins. Identijcation of Outliers.
Chapman and Hall, London, 1980.
J. H an, Y. Cai, and N. Cercone. Knowl- edge discovery in databases: An attribute- oriented approach. In PTOC. 18th VLDB, pages 547-559,1992.
J. Hellerstein, E. Koutsoupias, and C. Pa- padimitriou. On the analysis of indexing schemes. In Proc. PODS, pages 249-256, 1997. [JW92] [KN96] [KN97] [Kno97] [MT961 [MTV95] [NH941 [PS88] [RR961 [SamSO] [ZRL96] R. A. Johnson and D. W. Wichern. Applied Multivariate Statistical Analysis. Prentice- Hall, 3rd edition, 1992.
E. M. Knorr and R. T. Ng. Finding aggre- gate proximity relationships and common- alities in spatial data mining. IEEE Trans- actions on Knowledge and Data Engineer- ing, 8(6):884-897, 1996.
E. M. Knorr and R. T. Ng. A unified no- tion of outliers: Properties and computa- tion. In PTOC. KDD, pages 219-222, 1997. An extended version of this paper appears as: E. M. Knorr and R.T. Ng. A Unified Approach for Mining Outliers. In PTOC. 7th CASCON, pages 236-248, 1997.
E. M. Knorr. On digital money and card technologies. Technical Report 97-02, Uni- versity of British Columbia, 1997.
H. Mannila and H. Toivonen. Discovering generalized episodes using minimal occur- rences. In Proc. KDD, pages 146-151,1996.
H. Mannila, H. Toivonen, and A. Verkamo. Discovering frequent episodes in sequences. In Proc. KDD, pages 210-215, 1995.
R. Ng and J. Han. Efficient and effective clustering methods for spatial data mining. In PTOC. 20th VLDB, pages 144-155, 1994.
F. Preparata and M. Shamos. Com- putational Geometry: an Introduction.
I. Ruts and P. Rousseeuw. Computing depth contours of bivariate point clouds. Computational Statistics and Data Analy- sis, 23:153-168, 1996.
H. Samet. The Design and Analysis of Spatial Data Structures. Addison-Wesley, 1990. T. Zhang, R. Ramakrishnan, and M. Livny. BIRCH: An efficient data clustering method for very large databases. In PTOC.
ACM SIGMOD, pages 103-114,1996.