Jack Rueter - Academia.edu (original) (raw)

Papers by Jack Rueter

Research paper thumbnail of Sentiment analysis data and word embeddings for Erzya, Komi-Zyrian, Moksha and Udmurt

Zenodo (CERN European Organization for Nuclear Research), 2023

Research paper thumbnail of Finite-State Morphology Code With Compiled Analyzers And Generators For Skolt Sami

Two-level morphophonology and lexc morphological descriptions of Skolt Sami as coded on the Giell... more Two-level morphophonology and lexc morphological descriptions of Skolt Sami as coded on the Giella Infrastructure at the Arctic University of Norway, in Tromsø. Funded by scholarship from Kone Foundation

Research paper thumbnail of Neural models for morphological generation, analysis and lemmatization in 22 languages

Morphological models for generation, lemmatization and analysis in 22 languages. The models are t... more Morphological models for generation, lemmatization and analysis in 22 languages. The models are trained in OpenNMT-py https://github.com/OpenNMT/OpenNMT-py. Feed one word at a time, split into characters (kissa -> k i s s a) Supported languages: German (deu), Kven (fkv), Komi-Zyrian (kpv), Mokhsa (mdf), Mansi (mns), Erzya (myv), Norwegian Bokmål (nob), Russian (rus), South Sami (sma), Lule Sami (smj), Skolt Sami (sms), Võro (vro), Finnish (fin), Komi-Permyak (koi), Latvian (lav), Eastern Mari (mhr), Western Mari (mrj), Namonuito (nmt), Olonets-Karelian (olo), Pite Sami (sje), Northern Sami (sme), Inari Sami (smn) and Udmurt (udm)

Research paper thumbnail of rueter/Mordvin-Varieties: First pre-release of Mordvin-Varieties

This initial pre-release contains three maps representing the locales where fieldwork was done co... more This initial pre-release contains three maps representing the locales where fieldwork was done collecting Erzya and Moksha language materials for and by Heikki Paasonen 1891–1912.

Research paper thumbnail of Linguistic Distance between Erzya and Moksha. Dependent Morphology

The purpose of this article is to outline morphological facts about the two literary languages Er... more The purpose of this article is to outline morphological facts about the two literary languages Erzya and Moksha, which can be used for estimating the distinctive character of these individual language forms. Whereas earlier morphological evaluations of the linguistic distance between Erzya and Moksha have placed them in the area of 90% cohesion, this one does not. This study evaluates the languages on the basis of non-ambiguity, parallel sets of ambiguity and divergent ambiguity. Non-ambiguity is found in combinatory function to morphological formant alignment, e.g. молян go+V+Ind+Prs+ScSg1. Parallel sets of ambiguity is found in combinatory-function set to morphological formant alignment where both languages share the same sets of ambiguous readings, e.g. саизь v s сявозь take+V+Ind+ScPl3+OcSg3, ScPl3+OcPl3. Divergent ambiguity is found in forms with non- symmetric alignments of combinatory functions, e.g. саинек take+V+Ind+Prt1+ScPl1, +Prt1+ScPl1+OcSg3, +Prt1+ScPl1+OcPl3 vs сявоме take+V+Ind+Prt1+ScPl1, сявоськ take+V+Ind+Prt1+ScPl1+OcSg3, +Prt1+ScPl1+OcPl3. This morphological evaluation will establish the preparatory work in syntactic disambiguation necessary for facilitating Erzya↔Moksha machine translation, whereas machine translation will enhance the usage of mutual language resources. Results show that the Erzya and Moksha languages, in the absence of loan words from the 20 th century, share less than 50% of their vocabularies, 63% of their regular nominal declensions and 48% of their regular finite conjugations.Peer reviewe

Research paper thumbnail of UD_Komi_Zyrian-Lattice 2.7

Research paper thumbnail of Корпус национальных мордовских языков: принципы разработки и перспективы функционирования/ действия

Research paper thumbnail of Мокшень и эрзянь кяльхнень фкакс- и аф фкаксшисна, синь валлувкссна / Эрзянь ды мокшонь кельтнень вейкекс- ды аволь вейкексчист, сынст валлувост

Research paper thumbnail of Nettidigisanakirja koltansaame-suomi

Research paper thumbnail of Moksha Mordvin

Routledge eBooks, Feb 20, 2023

Research paper thumbnail of Morphophonological Approach to Lushootseed Reduplication Research

Zenodo (CERN European Organization for Nuclear Research), Feb 22, 2023

Research paper thumbnail of Nettidigisanat : suomi-vuorimari

Research paper thumbnail of Вай, Нишке пазось, Шки пазось

Research paper thumbnail of Mordvin_Erzya_Abramov_Erzjanj-cjora-2_1973.imdi

Research paper thumbnail of Mordvin_Erzya_Abramov_Erzjanj-cjora-1_1971.imdi

Research paper thumbnail of Келу, келу, акша келу!

Research paper thumbnail of Mordvin_Erzya_Abramov_Isjak-jakinj-Najmanov_1987.imdi

Research paper thumbnail of UD_Erzya-JR 2.5

Research paper thumbnail of UD_Erzya-JR 2.8

Research paper thumbnail of UD_Komi_Zyrian-Lattice 2.8

Research paper thumbnail of Sentiment analysis data and word embeddings for Erzya, Komi-Zyrian, Moksha and Udmurt

Zenodo (CERN European Organization for Nuclear Research), 2023

Research paper thumbnail of Finite-State Morphology Code With Compiled Analyzers And Generators For Skolt Sami

Two-level morphophonology and lexc morphological descriptions of Skolt Sami as coded on the Giell... more Two-level morphophonology and lexc morphological descriptions of Skolt Sami as coded on the Giella Infrastructure at the Arctic University of Norway, in Tromsø. Funded by scholarship from Kone Foundation

Research paper thumbnail of Neural models for morphological generation, analysis and lemmatization in 22 languages

Morphological models for generation, lemmatization and analysis in 22 languages. The models are t... more Morphological models for generation, lemmatization and analysis in 22 languages. The models are trained in OpenNMT-py https://github.com/OpenNMT/OpenNMT-py. Feed one word at a time, split into characters (kissa -> k i s s a) Supported languages: German (deu), Kven (fkv), Komi-Zyrian (kpv), Mokhsa (mdf), Mansi (mns), Erzya (myv), Norwegian Bokmål (nob), Russian (rus), South Sami (sma), Lule Sami (smj), Skolt Sami (sms), Võro (vro), Finnish (fin), Komi-Permyak (koi), Latvian (lav), Eastern Mari (mhr), Western Mari (mrj), Namonuito (nmt), Olonets-Karelian (olo), Pite Sami (sje), Northern Sami (sme), Inari Sami (smn) and Udmurt (udm)

Research paper thumbnail of rueter/Mordvin-Varieties: First pre-release of Mordvin-Varieties

This initial pre-release contains three maps representing the locales where fieldwork was done co... more This initial pre-release contains three maps representing the locales where fieldwork was done collecting Erzya and Moksha language materials for and by Heikki Paasonen 1891–1912.

Research paper thumbnail of Linguistic Distance between Erzya and Moksha. Dependent Morphology

The purpose of this article is to outline morphological facts about the two literary languages Er... more The purpose of this article is to outline morphological facts about the two literary languages Erzya and Moksha, which can be used for estimating the distinctive character of these individual language forms. Whereas earlier morphological evaluations of the linguistic distance between Erzya and Moksha have placed them in the area of 90% cohesion, this one does not. This study evaluates the languages on the basis of non-ambiguity, parallel sets of ambiguity and divergent ambiguity. Non-ambiguity is found in combinatory function to morphological formant alignment, e.g. молян go+V+Ind+Prs+ScSg1. Parallel sets of ambiguity is found in combinatory-function set to morphological formant alignment where both languages share the same sets of ambiguous readings, e.g. саизь v s сявозь take+V+Ind+ScPl3+OcSg3, ScPl3+OcPl3. Divergent ambiguity is found in forms with non- symmetric alignments of combinatory functions, e.g. саинек take+V+Ind+Prt1+ScPl1, +Prt1+ScPl1+OcSg3, +Prt1+ScPl1+OcPl3 vs сявоме take+V+Ind+Prt1+ScPl1, сявоськ take+V+Ind+Prt1+ScPl1+OcSg3, +Prt1+ScPl1+OcPl3. This morphological evaluation will establish the preparatory work in syntactic disambiguation necessary for facilitating Erzya↔Moksha machine translation, whereas machine translation will enhance the usage of mutual language resources. Results show that the Erzya and Moksha languages, in the absence of loan words from the 20 th century, share less than 50% of their vocabularies, 63% of their regular nominal declensions and 48% of their regular finite conjugations.Peer reviewe

Research paper thumbnail of UD_Komi_Zyrian-Lattice 2.7

Research paper thumbnail of Корпус национальных мордовских языков: принципы разработки и перспективы функционирования/ действия

Research paper thumbnail of Мокшень и эрзянь кяльхнень фкакс- и аф фкаксшисна, синь валлувкссна / Эрзянь ды мокшонь кельтнень вейкекс- ды аволь вейкексчист, сынст валлувост

Research paper thumbnail of Nettidigisanakirja koltansaame-suomi

Research paper thumbnail of Moksha Mordvin

Routledge eBooks, Feb 20, 2023

Research paper thumbnail of Morphophonological Approach to Lushootseed Reduplication Research

Zenodo (CERN European Organization for Nuclear Research), Feb 22, 2023

Research paper thumbnail of Nettidigisanat : suomi-vuorimari

Research paper thumbnail of Вай, Нишке пазось, Шки пазось

Research paper thumbnail of Mordvin_Erzya_Abramov_Erzjanj-cjora-2_1973.imdi

Research paper thumbnail of Mordvin_Erzya_Abramov_Erzjanj-cjora-1_1971.imdi

Research paper thumbnail of Келу, келу, акша келу!

Research paper thumbnail of Mordvin_Erzya_Abramov_Isjak-jakinj-Najmanov_1987.imdi

Research paper thumbnail of UD_Erzya-JR 2.5

Research paper thumbnail of UD_Erzya-JR 2.8

Research paper thumbnail of UD_Komi_Zyrian-Lattice 2.8

Research paper thumbnail of Normalizing Early English Letters to Present-day English Spelling

Proceedings of the 3rd Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature, 2019

This paper presents multiple methods for normalizing the most deviant and infrequent historical s... more This paper presents multiple methods for normalizing the most deviant and infrequent historical spellings in a corpus consisting of personal correspondence from the 15th to the 19th century. The methods include machine translation (neural and statistical), edit distance and rule-based FST. Different normalization methods are compared and evaluated. All of the methods have their own strengths in word normalization. This calls for finding ways of combining the results from these methods to leverage their individual strengths.