Transforming Strings to Vector Spaces Using Prototype Selection (original) (raw)

Abstract

A common way of expressing string similarity in structural pattern recognition is the edit distance. It allows one to apply the _k_NN rule in order to classify a set of strings. However, compared to the wide range of elaborated classifiers known from statistical pattern recognition, this is only a very basic method. In the present paper we propose a method for transforming strings into _n_-dimensional real vector spaces based on prototype selection. This allows us to subsequently classify the transformed strings with more sophisticated classifiers, such as support vector machine and other kernel based methods. In a number of experiments, we show that the recognition rate can be significantly improved by means of this procedure.

Chapter PDF

Similar content being viewed by others

References

  1. Bunke, H., Sanfeliu, A.: Syntactic and Structural Pattern Recognition – Theory and Applications. World Scientific Publ. Co., Singapore (1990)
    MATH Google Scholar
  2. Cha, S.H., Shin, Y.C., Srihari, S.N.: Approximate stroke sequence matching algorithm for character recognition and analysis. In: 5th International Conference on Document Analysis and Recognition, pp. 53–56 (1999)
    Google Scholar
  3. Bunke, H., Bühler, U.: Applications of approximate string matching to 2D shape recognition. Pattern Recognition 26, 1797–1812 (1993)
    Article Google Scholar
  4. Chen, S.W., Tung, S.T., Fang, C.Y., Cheng, S., Jain, A.K.: Extended attributed string matching for shape recognition. Computer Vision and Image Understanding 70, 36–50 (1998)
    Article Google Scholar
  5. Durbin, R., Eddy, S.R., Krogh, A., Mitchison, G.: Biological sequence analysis. Cambridge University Press, Cambridge (1998)
    Book MATH Google Scholar
  6. Wagner, R.A., Fischer, M.J.: The string-to-string correction problem. Journal of the ACM 21, 168–173 (1974)
    Article MATH MathSciNet Google Scholar
  7. Duda, R., Hart, P., Stork, D.: Pattern Classification, 2nd edn. Wiley, New York (2001)
    MATH Google Scholar
  8. Vapnik, V.: The Nature of Statistical Learning Theory, 2nd edn. Springer, Heidelberg (2000)
    MATH Google Scholar
  9. Wilson, R.C., Hancock, E.R., Luo, B.: Pattern vectors from algebraic graph theory. IEEE Transactions on Pattern Analysis and Machine Intelligence 27, 1112–1124 (2005)
    Article Google Scholar
  10. Hjaltason, G.R., Samet, H.: Properties of embedding methods for similarity searching in metric spaces. IEEE Transactions on Pattern Analysis and Machine Intelligence 25, 530–549 (2003)
    Article Google Scholar
  11. Pękalska, E.: Dissimilarity representations in pattern recognition. PhD thesis, Delft University of Technology (2005)
    Google Scholar
  12. Pękalska, E., Duin, R.P., Paclík, P.: Prototype selection for dissimilarity-based classifiers. Pattern Recognition 39, 189–208 (2006)
    Article MATH Google Scholar
  13. Kohonen, T.: Median strings. Pattern Recognition Letters 3, 309–313 (1985)
    Article Google Scholar
  14. Katsavounidis, I., Kuo, C.C.J., Zhang, Z.: A new initialization technique for generalized lloyd iteration. IEEE Signal processing letters 1, 144–146 (1994)
    Article Google Scholar
  15. Juan, A., Vidal, E.: Comparison of four initialization techniques for the k -medians clustering algorithm. In: Amin, A., Pudil, P., Ferri, F., Iñesta, J.M. (eds.) SPR 2000 and SSPR 2000. LNCS, vol. 1876, pp. 842–852. Springer, Heidelberg (2000)
    Chapter Google Scholar
  16. Jain, A.K., Dubes, R.C.: Algorithms for clustering data. Prentice-Hall, Inc., Upper Saddle River (1988)
    MATH Google Scholar
  17. Alpaydin, E., Alimoglu, F.: Department of Computer Engineering, Bogaziçi University, 80815 Istanbul Turkey (1998), ftp://ftp.ics.uci.edu/pub/mlearn/databases/pendigits
  18. Shawe-Taylor, J., Cristianini, N.: Kernel Methods for Pattern Analysis. Cambridge University Press, Cambridge (2004)
    Google Scholar
  19. Vapnik, V.: Statistical Learning Theory. Wiley-Interscience, Chichester (1998)
    MATH Google Scholar
  20. Chang, C.C., Lin, C.J.: LIBSVM: a library for support vector machines (2001), http://www.csie.ntu.edu.tw/~cjlin/libsvm
  21. Alimoglu, F., Alpaydin, E.: Combining multiple representations for pen-based handwritten digit recognition. Turk. J. Elec. Engin. 9 (2001)
    Google Scholar

Download references

Author information

Authors and Affiliations

  1. Institute of Computer Science and Applied Mathematics, University of Bern, Neubrückstrasse 10, CH-3012, Bern, Switzerland
    Barbara Spillmann, Michel Neuhaus & Horst Bunke
  2. Faculty of Electrical Engineering, Mathematics and Computer Science, Delft University of Technology, Mekelweg 4, 2628, CD, Delft, The Netherlands
    Elżbieta Pękalska & Robert P. W. Duin

Authors

  1. Barbara Spillmann
  2. Michel Neuhaus
  3. Horst Bunke
  4. Elżbieta Pękalska
  5. Robert P. W. Duin

Editor information

Editors and Affiliations

  1. Hong Kong University of Science and Technology,
    Dit-Yan Yeung
  2. Department of Computer Science, Hong Kong University of Science and Technology, Clear Water Bay, Kowloon, Hong Kong, China
    James T. Kwok
  3. Instituto de Telecomunicações, Instituto Superior Técnico, Lisbon, Portugal
    Ana Fred
  4. Department of Electrical and Electronic Engineering, University of Cagliari, Piazza d’Armi, 09123, Cagliari, Italy
    Fabio Roli
  5. Faculty of Electrical Engineering, Mathematics and Computer Science, Information and Communication Theory Group, Delft University of Technology, Delft, The Netherlands
    Dick de Ridder

Rights and permissions

© 2006 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Spillmann, B., Neuhaus, M., Bunke, H., Pękalska, E., Duin, R.P.W. (2006). Transforming Strings to Vector Spaces Using Prototype Selection. In: Yeung, DY., Kwok, J.T., Fred, A., Roli, F., de Ridder, D. (eds) Structural, Syntactic, and Statistical Pattern Recognition. SSPR /SPR 2006. Lecture Notes in Computer Science, vol 4109. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11815921\_31

Download citation

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Publish with us