An EM-Approach for Clustering Multi-Instance Objects (original) (raw)
Abstract
In many data mining applications the data objects are modeled as sets of feature vectors or multi-instance objects. In this paper, we present an expectation maximization approach for clustering multi-instance objects. We therefore present a statistical process that models multi-instance objects. Furthermore, we present M-steps and E-steps for EM clustering and a method for finding a good initial model. In our experimental evaluation, we demonstrate that the new EM algorithm is capable to increase the cluster quality for three real world data sets compared to a _k_-medoid clustering.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
- Dietterich, T., Lathrop, R., Lozano-Perez, T.: Solving the multiple instance problem with axis-parallel rectangles. Artificial Intelligence 89, 31–71 (1997)
Article MATH Google Scholar - Kriegel, H.P., Schubert, M.: Classification of websites as sets of feature vectors. In: Proc. IASTED Int. Conf. on Databases and Applications (DBA 2004), Innsbruck, Austria (2004)
Google Scholar - Zhou, Z.H.: Multi-Instance Learning: A Survey. Technical Report, AI Lab, Computer Science a. Technology Department, Nanjing University, Nanjing, China (2004)
Google Scholar - Ruffo, G.: Learning single and multiple instance decision tree for computer security applications. PhD thesis, Department of Computer Science, University of Turin, Torino, Italy (2000)
Google Scholar - Weidmann, N., Frank, E., Pfahringer, B.: A two-level learning method for generalized multi-instance problems. In: Lavrač, N., Gamberger, D., Todorovski, L., Blockeel, H. (eds.) ECML 2003. LNCS, vol. 2837, pp. 468–479. Springer, Heidelberg (2003)
Chapter Google Scholar - Eiter, T., Mannila, H.: Distance Measures for Point Sets and Their Computation. Acta Informatica 34, 103–133 (1997)
Article MathSciNet MATH Google Scholar - Ramon, J., Bruynooghe, M.: A polynomial time computable metric between points sets. Acta Informatica 37, 765–780 (2001)
Article MathSciNet MATH Google Scholar - Han, J., Kamber, M.: Data Mining Concepts and Techniques. Morgan Kaufmann Publishers, San Francisco (2001)
MATH Google Scholar - Ester, M., Kriegel, H.P., Sander, J., Xu, X.: A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise. In: Proc. Int. Conf. on Knowledge Discovery and Data Mining (KDD), pp. 291–316 (1996)
Google Scholar - Gärtner, T., Flach, P., Kowalczyk, A., Smola, A.: Multi-Instance Kernels, pp. 179–186 (2002)
Google Scholar - Ng, R., Han, J.: Efficient and Effective Clustering Methods for Spatial Data Mining. In: Proc. Int. Conf. on Very Large Databases (VLDB), pp. 144–155 (1994)
Google Scholar - Wang, J., Zucker, J.: Solving Multiple-Instance Problem: A Lazy Learning Approach, pp. 1119–1125 (2000)
Google Scholar - Han, J., Kamber, M.: Data Mining: Concepts and Techniques. Academic Press, London (2001)
MATH Google Scholar - Fayyad, U., Reina, C., Bradley, P.: Initialization of Iterative Refinement Clustering Algorithms. In: Proc. Int. Conf. on Knowledge Discovery in Databases (KDD) (1998)
Google Scholar - Smyth, P.: Clustering using monte carlo cross-validation. In: KDD, pp. 126–133 (1996)
Google Scholar - Wang, J.T.L., Ma, Q., Shasha, D., Wu, C.H.: New techniques for extracting features from protein sequences. IBM Syst. J. 40, 426–441 (2001)
Article Google Scholar - Newman, D.J., Hettich, S., Blake, C., Merz, C.: UCI repository of machine learning databases (1998)
Google Scholar
Author information
Authors and Affiliations
- Institute for Informatics, University of Munich, D-80538, Munich, Germany
Hans-Peter Kriegel, Alexey Pryakhin & Matthias Schubert
Authors
- Hans-Peter Kriegel
- Alexey Pryakhin
- Matthias Schubert
Editor information
Editors and Affiliations
- Nanyang Technological University, Singapore
Wee-Keong Ng - Institute of Industrial Science, The University of Tokyo, 4-6-1 Komaba, Meguro-ku, 153-8505, Tokyo, Japan
Masaru Kitsuregawa - School of Computer Science and Technology, Heilongjiang University, China
Jianzhong Li - School of Computer Engineering, Nanyang Technological University, 639798, Singapore, Singapore
Kuiyu Chang
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Kriegel, HP., Pryakhin, A., Schubert, M. (2006). An EM-Approach for Clustering Multi-Instance Objects. In: Ng, WK., Kitsuregawa, M., Li, J., Chang, K. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2006. Lecture Notes in Computer Science(), vol 3918. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11731139\_18
Download citation
- .RIS
- .ENW
- .BIB
- DOI: https://doi.org/10.1007/11731139\_18
- Publisher Name: Springer, Berlin, Heidelberg
- Print ISBN: 978-3-540-33206-0
- Online ISBN: 978-3-540-33207-7
- eBook Packages: Computer ScienceComputer Science (R0)Springer Nature Proceedings Computer Science
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.