Transforming Strings to Vector Spaces Using Prototype Selection (original) (raw)
Abstract
A common way of expressing string similarity in structural pattern recognition is the edit distance. It allows one to apply the _k_NN rule in order to classify a set of strings. However, compared to the wide range of elaborated classifiers known from statistical pattern recognition, this is only a very basic method. In the present paper we propose a method for transforming strings into _n_-dimensional real vector spaces based on prototype selection. This allows us to subsequently classify the transformed strings with more sophisticated classifiers, such as support vector machine and other kernel based methods. In a number of experiments, we show that the recognition rate can be significantly improved by means of this procedure.
Chapter PDF
Similar content being viewed by others
References
- Bunke, H., Sanfeliu, A.: Syntactic and Structural Pattern Recognition – Theory and Applications. World Scientific Publ. Co., Singapore (1990)
MATH Google Scholar - Cha, S.H., Shin, Y.C., Srihari, S.N.: Approximate stroke sequence matching algorithm for character recognition and analysis. In: 5th International Conference on Document Analysis and Recognition, pp. 53–56 (1999)
Google Scholar - Bunke, H., Bühler, U.: Applications of approximate string matching to 2D shape recognition. Pattern Recognition 26, 1797–1812 (1993)
Article Google Scholar - Chen, S.W., Tung, S.T., Fang, C.Y., Cheng, S., Jain, A.K.: Extended attributed string matching for shape recognition. Computer Vision and Image Understanding 70, 36–50 (1998)
Article Google Scholar - Durbin, R., Eddy, S.R., Krogh, A., Mitchison, G.: Biological sequence analysis. Cambridge University Press, Cambridge (1998)
Book MATH Google Scholar - Wagner, R.A., Fischer, M.J.: The string-to-string correction problem. Journal of the ACM 21, 168–173 (1974)
Article MATH MathSciNet Google Scholar - Duda, R., Hart, P., Stork, D.: Pattern Classification, 2nd edn. Wiley, New York (2001)
MATH Google Scholar - Vapnik, V.: The Nature of Statistical Learning Theory, 2nd edn. Springer, Heidelberg (2000)
MATH Google Scholar - Wilson, R.C., Hancock, E.R., Luo, B.: Pattern vectors from algebraic graph theory. IEEE Transactions on Pattern Analysis and Machine Intelligence 27, 1112–1124 (2005)
Article Google Scholar - Hjaltason, G.R., Samet, H.: Properties of embedding methods for similarity searching in metric spaces. IEEE Transactions on Pattern Analysis and Machine Intelligence 25, 530–549 (2003)
Article Google Scholar - Pękalska, E.: Dissimilarity representations in pattern recognition. PhD thesis, Delft University of Technology (2005)
Google Scholar - Pękalska, E., Duin, R.P., Paclík, P.: Prototype selection for dissimilarity-based classifiers. Pattern Recognition 39, 189–208 (2006)
Article MATH Google Scholar - Kohonen, T.: Median strings. Pattern Recognition Letters 3, 309–313 (1985)
Article Google Scholar - Katsavounidis, I., Kuo, C.C.J., Zhang, Z.: A new initialization technique for generalized lloyd iteration. IEEE Signal processing letters 1, 144–146 (1994)
Article Google Scholar - Juan, A., Vidal, E.: Comparison of four initialization techniques for the k -medians clustering algorithm. In: Amin, A., Pudil, P., Ferri, F., Iñesta, J.M. (eds.) SPR 2000 and SSPR 2000. LNCS, vol. 1876, pp. 842–852. Springer, Heidelberg (2000)
Chapter Google Scholar - Jain, A.K., Dubes, R.C.: Algorithms for clustering data. Prentice-Hall, Inc., Upper Saddle River (1988)
MATH Google Scholar - Alpaydin, E., Alimoglu, F.: Department of Computer Engineering, Bogaziçi University, 80815 Istanbul Turkey (1998), ftp://ftp.ics.uci.edu/pub/mlearn/databases/pendigits
- Shawe-Taylor, J., Cristianini, N.: Kernel Methods for Pattern Analysis. Cambridge University Press, Cambridge (2004)
Google Scholar - Vapnik, V.: Statistical Learning Theory. Wiley-Interscience, Chichester (1998)
MATH Google Scholar - Chang, C.C., Lin, C.J.: LIBSVM: a library for support vector machines (2001), http://www.csie.ntu.edu.tw/~cjlin/libsvm
- Alimoglu, F., Alpaydin, E.: Combining multiple representations for pen-based handwritten digit recognition. Turk. J. Elec. Engin. 9 (2001)
Google Scholar
Author information
Authors and Affiliations
- Institute of Computer Science and Applied Mathematics, University of Bern, Neubrückstrasse 10, CH-3012, Bern, Switzerland
Barbara Spillmann, Michel Neuhaus & Horst Bunke - Faculty of Electrical Engineering, Mathematics and Computer Science, Delft University of Technology, Mekelweg 4, 2628, CD, Delft, The Netherlands
Elżbieta Pękalska & Robert P. W. Duin
Authors
- Barbara Spillmann
- Michel Neuhaus
- Horst Bunke
- Elżbieta Pękalska
- Robert P. W. Duin
Editor information
Editors and Affiliations
- Hong Kong University of Science and Technology,
Dit-Yan Yeung - Department of Computer Science, Hong Kong University of Science and Technology, Clear Water Bay, Kowloon, Hong Kong, China
James T. Kwok - Instituto de Telecomunicações, Instituto Superior Técnico, Lisbon, Portugal
Ana Fred - Department of Electrical and Electronic Engineering, University of Cagliari, Piazza d’Armi, 09123, Cagliari, Italy
Fabio Roli - Faculty of Electrical Engineering, Mathematics and Computer Science, Information and Communication Theory Group, Delft University of Technology, Delft, The Netherlands
Dick de Ridder
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Spillmann, B., Neuhaus, M., Bunke, H., Pękalska, E., Duin, R.P.W. (2006). Transforming Strings to Vector Spaces Using Prototype Selection. In: Yeung, DY., Kwok, J.T., Fred, A., Roli, F., de Ridder, D. (eds) Structural, Syntactic, and Statistical Pattern Recognition. SSPR /SPR 2006. Lecture Notes in Computer Science, vol 4109. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11815921\_31
Download citation
- .RIS
- .ENW
- .BIB
- DOI: https://doi.org/10.1007/11815921\_31
- Publisher Name: Springer, Berlin, Heidelberg
- Print ISBN: 978-3-540-37236-3
- Online ISBN: 978-3-540-37241-7
- eBook Packages: Computer ScienceComputer Science (R0)Springer Nature Proceedings Computer Science
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.