Towards the Automatic Lemmatization of 16th Century Mexican Spanish: A Stemming Scheme for the CHEM (original) (raw)

Abstract

Two of the problems that should arise when developing a stemming scheme for diachronic corpora are: (1) morphological systems of natural languages may vary throughout time, and these changes are normally not documented sufficiently; and (2) they exhibit very diverse orthographic characteristics. In this short paper, a stemming strategy for a diachronic corpus of Mexican Spanish is briefly described, which partially faces up to these problems. Success rates of the method are contrasted to those of a Porter stemmer.

Preview

Unable to display preview. Download preview PDF.

Similar content being viewed by others

Introduction

Chapter © 2019

References

  1. Porter, M.F.: An Algorithm for Suffix Stripping. Program 14(3), 130–137 (1980)
    Google Scholar
  2. Medina-Urrea, A., Hlaváčová, J.: Automatic Recognition of Czech Derivational Prefixes. In: Gelbukh, A. (ed.) CICLing 2005. LNCS, vol. 3406, pp. 189–197. Springer, Heidelberg (2005)
    Chapter Google Scholar
  3. Medina-Urrea, A., Buenrostro Díaz, E.C.: Características cuantitativas de la flexión verbal del chuj. Estudios de Lingüística Aplicada 38, 15–31 (2003)
    Google Scholar
  4. Medina-Urrea, A., Alvarado García, M.: Análisis cuantitativo y cualitativo de la derivación léxica en ralámuli. Primer Coloquio Leonardo Manrique, Mexico, Conaculta-INAH (2004)
    Google Scholar
  5. Medina-Urrea, A.: Automatic Discovery of Affixes by Means of a Corpus: A Catalog of Spanish Affixes. Journal of Quantitative Linguistics 7(2), 97–114 (2000)
    Article Google Scholar
  6. Harris, J.: Historical Excursus: Reflexes of the Medieval Stridents. In: Spanish Phonology, pp. 189–206. MIT Press, Cambridge (1969)
    Google Scholar

Download references

Author information

Authors and Affiliations

  1. GIL, Instituto de Ingeniería, UNAM, Ciudad Universitaria, 04510, Coyoacán, DF, Mexico
    Alfonso Medina-Urrea

Authors

  1. Alfonso Medina-Urrea

Editor information

Editors and Affiliations

  1. National Polytechnic Institute, Center for Computing Research, 07738, Mexico City, México
    Alexander Gelbukh

Rights and permissions

© 2006 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Medina-Urrea, A. (2006). Towards the Automatic Lemmatization of 16th Century Mexican Spanish: A Stemming Scheme for the CHEM. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2006. Lecture Notes in Computer Science, vol 3878. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11671299\_12

Download citation

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Publish with us