An EM-Approach for Clustering Multi-Instance Objects (original) (raw)

Abstract

In many data mining applications the data objects are modeled as sets of feature vectors or multi-instance objects. In this paper, we present an expectation maximization approach for clustering multi-instance objects. We therefore present a statistical process that models multi-instance objects. Furthermore, we present M-steps and E-steps for EM clustering and a method for finding a good initial model. In our experimental evaluation, we demonstrate that the new EM algorithm is capable to increase the cluster quality for three real world data sets compared to a _k_-medoid clustering.

Preview

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Dietterich, T., Lathrop, R., Lozano-Perez, T.: Solving the multiple instance problem with axis-parallel rectangles. Artificial Intelligence 89, 31–71 (1997)
    Article MATH Google Scholar
  2. Kriegel, H.P., Schubert, M.: Classification of websites as sets of feature vectors. In: Proc. IASTED Int. Conf. on Databases and Applications (DBA 2004), Innsbruck, Austria (2004)
    Google Scholar
  3. Zhou, Z.H.: Multi-Instance Learning: A Survey. Technical Report, AI Lab, Computer Science a. Technology Department, Nanjing University, Nanjing, China (2004)
    Google Scholar
  4. Ruffo, G.: Learning single and multiple instance decision tree for computer security applications. PhD thesis, Department of Computer Science, University of Turin, Torino, Italy (2000)
    Google Scholar
  5. Weidmann, N., Frank, E., Pfahringer, B.: A two-level learning method for generalized multi-instance problems. In: Lavrač, N., Gamberger, D., Todorovski, L., Blockeel, H. (eds.) ECML 2003. LNCS, vol. 2837, pp. 468–479. Springer, Heidelberg (2003)
    Chapter Google Scholar
  6. Eiter, T., Mannila, H.: Distance Measures for Point Sets and Their Computation. Acta Informatica 34, 103–133 (1997)
    Article MathSciNet MATH Google Scholar
  7. Ramon, J., Bruynooghe, M.: A polynomial time computable metric between points sets. Acta Informatica 37, 765–780 (2001)
    Article MathSciNet MATH Google Scholar
  8. Han, J., Kamber, M.: Data Mining Concepts and Techniques. Morgan Kaufmann Publishers, San Francisco (2001)
    MATH Google Scholar
  9. Ester, M., Kriegel, H.P., Sander, J., Xu, X.: A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise. In: Proc. Int. Conf. on Knowledge Discovery and Data Mining (KDD), pp. 291–316 (1996)
    Google Scholar
  10. Gärtner, T., Flach, P., Kowalczyk, A., Smola, A.: Multi-Instance Kernels, pp. 179–186 (2002)
    Google Scholar
  11. Ng, R., Han, J.: Efficient and Effective Clustering Methods for Spatial Data Mining. In: Proc. Int. Conf. on Very Large Databases (VLDB), pp. 144–155 (1994)
    Google Scholar
  12. Wang, J., Zucker, J.: Solving Multiple-Instance Problem: A Lazy Learning Approach, pp. 1119–1125 (2000)
    Google Scholar
  13. Han, J., Kamber, M.: Data Mining: Concepts and Techniques. Academic Press, London (2001)
    MATH Google Scholar
  14. Fayyad, U., Reina, C., Bradley, P.: Initialization of Iterative Refinement Clustering Algorithms. In: Proc. Int. Conf. on Knowledge Discovery in Databases (KDD) (1998)
    Google Scholar
  15. Smyth, P.: Clustering using monte carlo cross-validation. In: KDD, pp. 126–133 (1996)
    Google Scholar
  16. Wang, J.T.L., Ma, Q., Shasha, D., Wu, C.H.: New techniques for extracting features from protein sequences. IBM Syst. J. 40, 426–441 (2001)
    Article Google Scholar
  17. Newman, D.J., Hettich, S., Blake, C., Merz, C.: UCI repository of machine learning databases (1998)
    Google Scholar

Download references

Author information

Authors and Affiliations

  1. Institute for Informatics, University of Munich, D-80538, Munich, Germany
    Hans-Peter Kriegel, Alexey Pryakhin & Matthias Schubert

Authors

  1. Hans-Peter Kriegel
  2. Alexey Pryakhin
  3. Matthias Schubert

Editor information

Editors and Affiliations

  1. Nanyang Technological University, Singapore
    Wee-Keong Ng
  2. Institute of Industrial Science, The University of Tokyo, 4-6-1 Komaba, Meguro-ku, 153-8505, Tokyo, Japan
    Masaru Kitsuregawa
  3. School of Computer Science and Technology, Heilongjiang University, China
    Jianzhong Li
  4. School of Computer Engineering, Nanyang Technological University, 639798, Singapore, Singapore
    Kuiyu Chang

Rights and permissions

© 2006 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Kriegel, HP., Pryakhin, A., Schubert, M. (2006). An EM-Approach for Clustering Multi-Instance Objects. In: Ng, WK., Kitsuregawa, M., Li, J., Chang, K. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2006. Lecture Notes in Computer Science(), vol 3918. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11731139\_18

Download citation

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Publish with us