Towards the Automatic Lemmatization of 16th Century Mexican Spanish: A Stemming Scheme for the CHEM (original) (raw)
Abstract
Two of the problems that should arise when developing a stemming scheme for diachronic corpora are: (1) morphological systems of natural languages may vary throughout time, and these changes are normally not documented sufficiently; and (2) they exhibit very diverse orthographic characteristics. In this short paper, a stemming strategy for a diachronic corpus of Mexican Spanish is briefly described, which partially faces up to these problems. Success rates of the method are contrasted to those of a Porter stemmer.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
Introduction
Chapter © 2019
References
- Porter, M.F.: An Algorithm for Suffix Stripping. Program 14(3), 130–137 (1980)
Google Scholar - Medina-Urrea, A., Hlaváčová, J.: Automatic Recognition of Czech Derivational Prefixes. In: Gelbukh, A. (ed.) CICLing 2005. LNCS, vol. 3406, pp. 189–197. Springer, Heidelberg (2005)
Chapter Google Scholar - Medina-Urrea, A., Buenrostro Díaz, E.C.: Características cuantitativas de la flexión verbal del chuj. Estudios de Lingüística Aplicada 38, 15–31 (2003)
Google Scholar - Medina-Urrea, A., Alvarado García, M.: Análisis cuantitativo y cualitativo de la derivación léxica en ralámuli. Primer Coloquio Leonardo Manrique, Mexico, Conaculta-INAH (2004)
Google Scholar - Medina-Urrea, A.: Automatic Discovery of Affixes by Means of a Corpus: A Catalog of Spanish Affixes. Journal of Quantitative Linguistics 7(2), 97–114 (2000)
Article Google Scholar - Harris, J.: Historical Excursus: Reflexes of the Medieval Stridents. In: Spanish Phonology, pp. 189–206. MIT Press, Cambridge (1969)
Google Scholar
Author information
Authors and Affiliations
- GIL, Instituto de Ingeniería, UNAM, Ciudad Universitaria, 04510, Coyoacán, DF, Mexico
Alfonso Medina-Urrea
Authors
- Alfonso Medina-Urrea
Editor information
Editors and Affiliations
- National Polytechnic Institute, Center for Computing Research, 07738, Mexico City, México
Alexander Gelbukh
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Medina-Urrea, A. (2006). Towards the Automatic Lemmatization of 16th Century Mexican Spanish: A Stemming Scheme for the CHEM. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2006. Lecture Notes in Computer Science, vol 3878. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11671299\_12
Download citation
- .RIS
- .ENW
- .BIB
- DOI: https://doi.org/10.1007/11671299\_12
- Publisher Name: Springer, Berlin, Heidelberg
- Print ISBN: 978-3-540-32205-4
- Online ISBN: 978-3-540-32206-1
- eBook Packages: Computer ScienceComputer Science (R0)Springer Nature Proceedings Computer Science
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.