Elisabeth Mager - Academia.edu (original) (raw)

Papers by Elisabeth Mager

Research paper thumbnail of ALPHABETISCHES SEMASIOLOGISCHES WÖRTERBUCH DER GESAMTÜBERLIEFERUNG. Part 2

De Gruyter eBooks, Dec 31, 1995

Research paper thumbnail of Relación entre derecho indígena y soberanía en las etnias de Norteamérica

Punto CUNORTE

Las etnias de Norteamérica fueron expulsadas de su territorio por la invasión europea y la mayorí... more Las etnias de Norteamérica fueron expulsadas de su territorio por la invasión europea y la mayoría de ellas fueron confinadas en reservaciones. En estas condiciones, las etnias sufrieron la opresión por la sociedad dominante debido a la asimetría de poder. Mediante este artículo se pretende mostrar cómo los pueblos se defendieron a través de su derecho indígena para lograr cierto estatus de soberanía. Para este propósito, se compara esta lucha en los cahuillas de California, los kikapúes de Coahuila y Texas, y los mohawks de Quebec. De esta manera, este trabajo está dividido en la reflexión teórica del concepto de soberanía, la política del Estado y el derecho indígena, y la lucha por la soberanía. Esta última sección analiza la lucha de los cahui llas a través de un proceso legal; la de los kikapúes a través del Kickapoo Trust Land Acquisition Committee, y la de los mohawks en el conflicto Oka. Finalmente, se presentan algunas conclusiones acerca de la sobera nía de las etnias de N...

Research paper thumbnail of Casinos y poder

Universidad Nacional Autónoma de México, 2010

Research paper thumbnail of Marejadas Rurales y Luchas Por La Vida, Vol. III: Vaivenes Del Estado y La Sociedad Rural

ASOCIACION MEXICANA DE ESTUDIOS RURALES A.C., INSTITUTO DE CIENCIAS AGROPECUARIAS Y RURALES (ICAR... more ASOCIACION MEXICANA DE ESTUDIOS RURALES A.C., INSTITUTO DE CIENCIAS AGROPECUARIAS Y RURALES (ICAR), UNIVERSIDAD DE GUADALAJARA, EL COLEGIO DE MICHOACAN A.C., UNIVERSIDAD MICHOACANA DE SAN NICOLAS HIDALGO, CUCOSTA SUR GRANA, ECOSUR, FACULTAD DE ESTUDIOS SUPERIORES ACATLAN-UNAM

Research paper thumbnail of Ethical Considerations for Machine Translation of Indigenous Languages: Giving a Voice to the Speakers

In recent years machine translation has become very successful for high-resource language pairs. ... more In recent years machine translation has become very successful for high-resource language pairs. This has also sparked new interest in research on the automatic translation of lowresource languages, including Indigenous languages. However, the latter are deeply related to the ethnic and cultural groups that speak (or used to speak) them. The data collection, modeling and deploying machine translation systems thus result in new ethical questions that must be addressed. Motivated by this, we first survey the existing literature on ethical considerations for the documentation, translation, and general natural language processing for Indigenous languages. Afterward, we conduct and analyze an interview study to shed light on the positions of community leaders, teachers, and language activists regarding ethical concerns for the automatic translation of their languages. Our results show that the inclusion, at different degrees, of native speakers and community members is vital to performing better and more ethical research on Indigenous languages.

Research paper thumbnail of BPE vs. Morphological Segmentation: A Case Study on Machine Translation of Four Polysynthetic Languages

Findings of the Association for Computational Linguistics: ACL 2022, 2022

Morphologically-rich polysynthetic languages present a challenge for NLP systems due to data spar... more Morphologically-rich polysynthetic languages present a challenge for NLP systems due to data sparsity, and a common strategy to handle this issue is to apply subword segmentation. We investigate a wide variety of supervised and unsupervised morphological segmentation methods for four polysynthetic languages: Nahuatl, Raramuri, Shipibo-Konibo, and Wixarika. Then, we compare the morphologically inspired segmentation methods against Byte-Pair Encodings (BPEs) as inputs for machine translation (MT) when translating to and from Spanish. We show that for all language pairs except for Nahuatl, an unsupervised morphological segmentation algorithm outperforms BPEs consistently and that, although supervised methods achieve better segmentation scores, they under-perform in MT challenges. Finally, we contribute two new morphological segmentation datasets for Raramuri and Shipibo-Konibo, and a parallel corpus for Raramuri-Spanish.

Research paper thumbnail of Marejadas Rurales y Luchas Por La Vida, Vol. II: Conflictos Socioterritoriales y Por Recursos Naturales

INSTITUTO DE CIENCIAS AGROPECUARIAS Y RURALES (ICAR), UNIVERSIDAD DE GUADALAJARA, EL COLEGIO DE M... more INSTITUTO DE CIENCIAS AGROPECUARIAS Y RURALES (ICAR), UNIVERSIDAD DE GUADALAJARA, EL COLEGIO DE MICHOACAN A.C., FACULTAD DE ESTUDIOS SUPERIORES ACATLAN-UNAM, ECOSUR, CUCOSTA SUR GRANA, ASOCIACION MEXICANA DE ESTUDIOS RURALES A.C.

Research paper thumbnail of Ethical Considerations for Machine Translation of Indigenous Languages: Giving a Voice to the Speakers

arXiv (Cornell University), May 30, 2023

In recent years machine translation has become very successful for high-resource language pairs. ... more In recent years machine translation has become very successful for high-resource language pairs. This has also sparked new interest in research on the automatic translation of lowresource languages, including Indigenous languages. However, the latter are deeply related to the ethnic and cultural groups that speak (or used to speak) them. The data collection, modeling and deploying machine translation systems thus result in new ethical questions that must be addressed. Motivated by this, we first survey the existing literature on ethical considerations for the documentation, translation, and general natural language processing for Indigenous languages. Afterward, we conduct and analyze an interview study to shed light on the positions of community leaders, teachers, and language activists regarding ethical concerns for the automatic translation of their languages. Our results show that the inclusion, at different degrees, of native speakers and community members is vital to performing better and more ethical research on Indigenous languages.

Research paper thumbnail of AmericasNLI: Evaluating Zero-shot Natural Language Understanding of Pretrained Multilingual Models in Truly Low-resource Languages

arXiv (Cornell University), Apr 18, 2021

Pretrained multilingual models are able to perform cross-lingual transfer in a zero-shot setting,... more Pretrained multilingual models are able to perform cross-lingual transfer in a zero-shot setting, even for languages unseen during pretraining. However, prior work evaluating performance on unseen languages has largely been limited to low-level, syntactic tasks, and it remains unclear if zero-shot learning of high-level, semantic tasks is possible for unseen languages. To explore this question, we present AmericasNLI, an extension of XNLI (Conneau et al., 2018) to 10 Indigenous languages of the Americas. We conduct experiments with XLM-R, testing multiple zero-shot and translation-based approaches. Additionally, we explore model adaptation via continued pretraining and provide an analysis of the dataset by considering hypothesis-only models. We find that XLM-R's zero-shot performance is poor for all 10 languages, with an average performance of 38.48%. Continued pretraining offers improvements, with an average accuracy of 43.85%. Surprisingly, training on poorly translated data by far outperforms all other methods with an accuracy of 49.12%.

Research paper thumbnail of BPE vs. Morphological Segmentation: A Case Study on Machine Translation of Four Polysynthetic Languages

arXiv (Cornell University), Mar 16, 2022

Morphologically-rich polysynthetic languages present a challenge for NLP systems due to data spar... more Morphologically-rich polysynthetic languages present a challenge for NLP systems due to data sparsity, and a common strategy to handle this issue is to apply subword segmentation. We investigate a wide variety of supervised and unsupervised morphological segmentation methods for four polysynthetic languages: Nahuatl, Raramuri, Shipibo-Konibo, and Wixarika. Then, we compare the morphologically inspired segmentation methods against Byte-Pair Encodings (BPEs) as inputs for machine translation (MT) when translating to and from Spanish. We show that for all language pairs except for Nahuatl, an unsupervised morphological segmentation algorithm outperforms BPEs consistently and that, although supervised methods achieve better segmentation scores, they under-perform in MT challenges. Finally, we contribute two new morphological segmentation datasets for Raramuri and Shipibo-Konibo, and a parallel corpus for Raramuri-Spanish.

Research paper thumbnail of AmericasNLI: Machine translation and natural language inference systems for Indigenous languages of the Americas

Frontiers in artificial intelligence, Dec 2, 2022

languages in AmericasNLI, but model adaptation via continued pretraining results in improvements.... more languages in AmericasNLI, but model adaptation via continued pretraining results in improvements. All machine translation models are rather weak, but, surprisingly, translation-based approaches to natural language inference outperform all other models on that task.

Research paper thumbnail of Marejadas Rurales y Lucha Por La Vida, Vol. I:Construcción Sociocultural y Económica Del Campo

INSTITUTO DE CIENCIAS AGROPECUARIAS Y RURALES (ICAR), UNIVERSIDAD DE GUADALAJARA, EL COLEGIO DE M... more INSTITUTO DE CIENCIAS AGROPECUARIAS Y RURALES (ICAR), UNIVERSIDAD DE GUADALAJARA, EL COLEGIO DE MICHOACAN A.C., CUCOSTA SUR GRANA, FACULTAD DE ESTUDIOS SUPERIORES ACATLAN-UNAM, ECOSUR

Research paper thumbnail of Ethical Considerations for Machine Translation of Indigenous Languages: Giving a Voice to the Speakers

Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Research paper thumbnail of AmericasNLI: Machine translation and natural language inference systems for Indigenous languages of the Americas

Frontiers in Artificial Intelligence

Little attention has been paid to the development of human language technology for truly low-reso... more Little attention has been paid to the development of human language technology for truly low-resource languages—i.e., languages with limited amounts of digitally available text data, such as Indigenous languages. However, it has been shown that pretrained multilingual models are able to perform crosslingual transfer in a zero-shot setting even for low-resource languages which are unseen during pretraining. Yet, prior work evaluating performance on unseen languages has largely been limited to shallow token-level tasks. It remains unclear if zero-shot learning of deeper semantic tasks is possible for unseen languages. To explore this question, we present AmericasNLI, a natural language inference dataset covering 10 Indigenous languages of the Americas. We conduct experiments with pretrained models, exploring zero-shot learning in combination with model adaptation. Furthermore, as AmericasNLI is a multiway parallel dataset, we use it to benchmark the performance of different machine tr...

Research paper thumbnail of Lost in Translation: Analysis of Information Loss During Machine Translation Between Polysynthetic and Fusional Languages

Proceedings of the Workshop on Computational Modeling of Polysynthetic Languages, Aug 1, 2018

Machine translation from polysynthetic to fusional languages is a challenging task, which gets fu... more Machine translation from polysynthetic to fusional languages is a challenging task, which gets further complicated by the limited amount of parallel text available. Thus, translation performance is far from the state of the art for high-resource and more intensively studied language pairs. To shed light on the phenomena which hamper automatic translation to and from polysynthetic languages, we study translations from three low-resource, polysynthetic languages (Nahuatl, Wixarika and Yorem Nokki) into Spanish and vice versa. Doing so, we find that in a morpheme-to-morpheme alignment an important amount of information contained in polysynthetic morphemes has no Spanish counterpart, and its translation is often omitted. We further conduct a qualitative analysis and, thus, identify morpheme types that are commonly hard to align or ignored in the translation process.

Research paper thumbnail of AmericasNLI: Evaluating Zero-shot Natural Language Understanding of Pretrained Multilingual Models in Truly Low-resource Languages

Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Research paper thumbnail of BPE vs. Morphological Segmentation: A Case Study on Machine Translation of Four Polysynthetic Languages

Morphologically-rich polysynthetic languages present a challenge for NLP systems due to data spar... more Morphologically-rich polysynthetic languages present a challenge for NLP systems due to data sparsity, and a common strategy to handle this issue is to apply subword segmentation. We investigate a wide variety of supervised and unsupervised morphological segmentation methods for four polysynthetic languages: Nahuatl, Raramuri, Shipibo-Konibo, and Wixarika. Then, we compare the morphologically inspired segmentation methods against Byte-Pair Encodings (BPEs) as inputs for machine translation (MT) when translating to and from Spanish. We show that for all language pairs except for Nahuatl, an unsupervised morphological segmentation algorithm outperforms BPEs consistently and that, although supervised methods achieve better segmentation scores, they under-perform in MT challenges. Finally, we contribute two new morphological segmentation datasets for Raramuri and Shipibo-Konibo, and a parallel corpus for Raramuri--Spanish.

Research paper thumbnail of Lost in Translation: Analysis of Information Loss During Machine Translation Between Polysynthetic and Fusional Languages

ArXiv, 2018

Machine translation from polysynthetic to fusional languages is a challenging task, which gets fu... more Machine translation from polysynthetic to fusional languages is a challenging task, which gets further complicated by the limited amount of parallel text available. Thus, translation performance is far from the state of the art for high-resource and more intensively studied language pairs. To shed light on the phenomena which hamper automatic translation to and from polysynthetic languages, we study translations from three low-resource, polysynthetic languages (Nahuatl, Wixarika and Yorem Nokki) into Spanish and vice versa. Doing so, we find that in a morpheme-to-morpheme alignment an important amount of information contained in polysynthetic morphemes has no Spanish counterpart, and its translation is often omitted. We further conduct a qualitative analysis and, thus, identify morpheme types that are commonly hard to align or ignored in the translation process.

Research paper thumbnail of Ethnic Consciousness in Cultural Survival: The Morongo Band of Mission Indians and the Kickapoo Traditional Tribe of Texas

This article argues that ethnic consciousness has been important for the cultural survival of Nor... more This article argues that ethnic consciousness has been important for the cultural survival of North American indigenous tribes. In comparing the Morongo Band of Mission Indians and the Kickapoo Traditional Tribe of Texas, I posit that the politics of assimilation and integration into the capitalist system reduce ethnic consciousness, leading to greater cultural loss. By contrast, the renovation of ethnic consciousness can counteract cultural assimilation, strengthen the economy, and guarantee tribal survival. The article also discusses contextual issues of territory and reservation casinos.

Research paper thumbnail of Transpersonale Ergahrung in Novalis "Hymnen an die Nacht

Research paper thumbnail of ALPHABETISCHES SEMASIOLOGISCHES WÖRTERBUCH DER GESAMTÜBERLIEFERUNG. Part 2

De Gruyter eBooks, Dec 31, 1995

Research paper thumbnail of Relación entre derecho indígena y soberanía en las etnias de Norteamérica

Punto CUNORTE

Las etnias de Norteamérica fueron expulsadas de su territorio por la invasión europea y la mayorí... more Las etnias de Norteamérica fueron expulsadas de su territorio por la invasión europea y la mayoría de ellas fueron confinadas en reservaciones. En estas condiciones, las etnias sufrieron la opresión por la sociedad dominante debido a la asimetría de poder. Mediante este artículo se pretende mostrar cómo los pueblos se defendieron a través de su derecho indígena para lograr cierto estatus de soberanía. Para este propósito, se compara esta lucha en los cahuillas de California, los kikapúes de Coahuila y Texas, y los mohawks de Quebec. De esta manera, este trabajo está dividido en la reflexión teórica del concepto de soberanía, la política del Estado y el derecho indígena, y la lucha por la soberanía. Esta última sección analiza la lucha de los cahui llas a través de un proceso legal; la de los kikapúes a través del Kickapoo Trust Land Acquisition Committee, y la de los mohawks en el conflicto Oka. Finalmente, se presentan algunas conclusiones acerca de la sobera nía de las etnias de N...

Research paper thumbnail of Casinos y poder

Universidad Nacional Autónoma de México, 2010

Research paper thumbnail of Marejadas Rurales y Luchas Por La Vida, Vol. III: Vaivenes Del Estado y La Sociedad Rural

ASOCIACION MEXICANA DE ESTUDIOS RURALES A.C., INSTITUTO DE CIENCIAS AGROPECUARIAS Y RURALES (ICAR... more ASOCIACION MEXICANA DE ESTUDIOS RURALES A.C., INSTITUTO DE CIENCIAS AGROPECUARIAS Y RURALES (ICAR), UNIVERSIDAD DE GUADALAJARA, EL COLEGIO DE MICHOACAN A.C., UNIVERSIDAD MICHOACANA DE SAN NICOLAS HIDALGO, CUCOSTA SUR GRANA, ECOSUR, FACULTAD DE ESTUDIOS SUPERIORES ACATLAN-UNAM

Research paper thumbnail of Ethical Considerations for Machine Translation of Indigenous Languages: Giving a Voice to the Speakers

In recent years machine translation has become very successful for high-resource language pairs. ... more In recent years machine translation has become very successful for high-resource language pairs. This has also sparked new interest in research on the automatic translation of lowresource languages, including Indigenous languages. However, the latter are deeply related to the ethnic and cultural groups that speak (or used to speak) them. The data collection, modeling and deploying machine translation systems thus result in new ethical questions that must be addressed. Motivated by this, we first survey the existing literature on ethical considerations for the documentation, translation, and general natural language processing for Indigenous languages. Afterward, we conduct and analyze an interview study to shed light on the positions of community leaders, teachers, and language activists regarding ethical concerns for the automatic translation of their languages. Our results show that the inclusion, at different degrees, of native speakers and community members is vital to performing better and more ethical research on Indigenous languages.

Research paper thumbnail of BPE vs. Morphological Segmentation: A Case Study on Machine Translation of Four Polysynthetic Languages

Findings of the Association for Computational Linguistics: ACL 2022, 2022

Morphologically-rich polysynthetic languages present a challenge for NLP systems due to data spar... more Morphologically-rich polysynthetic languages present a challenge for NLP systems due to data sparsity, and a common strategy to handle this issue is to apply subword segmentation. We investigate a wide variety of supervised and unsupervised morphological segmentation methods for four polysynthetic languages: Nahuatl, Raramuri, Shipibo-Konibo, and Wixarika. Then, we compare the morphologically inspired segmentation methods against Byte-Pair Encodings (BPEs) as inputs for machine translation (MT) when translating to and from Spanish. We show that for all language pairs except for Nahuatl, an unsupervised morphological segmentation algorithm outperforms BPEs consistently and that, although supervised methods achieve better segmentation scores, they under-perform in MT challenges. Finally, we contribute two new morphological segmentation datasets for Raramuri and Shipibo-Konibo, and a parallel corpus for Raramuri-Spanish.

Research paper thumbnail of Marejadas Rurales y Luchas Por La Vida, Vol. II: Conflictos Socioterritoriales y Por Recursos Naturales

INSTITUTO DE CIENCIAS AGROPECUARIAS Y RURALES (ICAR), UNIVERSIDAD DE GUADALAJARA, EL COLEGIO DE M... more INSTITUTO DE CIENCIAS AGROPECUARIAS Y RURALES (ICAR), UNIVERSIDAD DE GUADALAJARA, EL COLEGIO DE MICHOACAN A.C., FACULTAD DE ESTUDIOS SUPERIORES ACATLAN-UNAM, ECOSUR, CUCOSTA SUR GRANA, ASOCIACION MEXICANA DE ESTUDIOS RURALES A.C.

Research paper thumbnail of Ethical Considerations for Machine Translation of Indigenous Languages: Giving a Voice to the Speakers

arXiv (Cornell University), May 30, 2023

In recent years machine translation has become very successful for high-resource language pairs. ... more In recent years machine translation has become very successful for high-resource language pairs. This has also sparked new interest in research on the automatic translation of lowresource languages, including Indigenous languages. However, the latter are deeply related to the ethnic and cultural groups that speak (or used to speak) them. The data collection, modeling and deploying machine translation systems thus result in new ethical questions that must be addressed. Motivated by this, we first survey the existing literature on ethical considerations for the documentation, translation, and general natural language processing for Indigenous languages. Afterward, we conduct and analyze an interview study to shed light on the positions of community leaders, teachers, and language activists regarding ethical concerns for the automatic translation of their languages. Our results show that the inclusion, at different degrees, of native speakers and community members is vital to performing better and more ethical research on Indigenous languages.

Research paper thumbnail of AmericasNLI: Evaluating Zero-shot Natural Language Understanding of Pretrained Multilingual Models in Truly Low-resource Languages

arXiv (Cornell University), Apr 18, 2021

Pretrained multilingual models are able to perform cross-lingual transfer in a zero-shot setting,... more Pretrained multilingual models are able to perform cross-lingual transfer in a zero-shot setting, even for languages unseen during pretraining. However, prior work evaluating performance on unseen languages has largely been limited to low-level, syntactic tasks, and it remains unclear if zero-shot learning of high-level, semantic tasks is possible for unseen languages. To explore this question, we present AmericasNLI, an extension of XNLI (Conneau et al., 2018) to 10 Indigenous languages of the Americas. We conduct experiments with XLM-R, testing multiple zero-shot and translation-based approaches. Additionally, we explore model adaptation via continued pretraining and provide an analysis of the dataset by considering hypothesis-only models. We find that XLM-R's zero-shot performance is poor for all 10 languages, with an average performance of 38.48%. Continued pretraining offers improvements, with an average accuracy of 43.85%. Surprisingly, training on poorly translated data by far outperforms all other methods with an accuracy of 49.12%.

Research paper thumbnail of BPE vs. Morphological Segmentation: A Case Study on Machine Translation of Four Polysynthetic Languages

arXiv (Cornell University), Mar 16, 2022

Morphologically-rich polysynthetic languages present a challenge for NLP systems due to data spar... more Morphologically-rich polysynthetic languages present a challenge for NLP systems due to data sparsity, and a common strategy to handle this issue is to apply subword segmentation. We investigate a wide variety of supervised and unsupervised morphological segmentation methods for four polysynthetic languages: Nahuatl, Raramuri, Shipibo-Konibo, and Wixarika. Then, we compare the morphologically inspired segmentation methods against Byte-Pair Encodings (BPEs) as inputs for machine translation (MT) when translating to and from Spanish. We show that for all language pairs except for Nahuatl, an unsupervised morphological segmentation algorithm outperforms BPEs consistently and that, although supervised methods achieve better segmentation scores, they under-perform in MT challenges. Finally, we contribute two new morphological segmentation datasets for Raramuri and Shipibo-Konibo, and a parallel corpus for Raramuri-Spanish.

Research paper thumbnail of AmericasNLI: Machine translation and natural language inference systems for Indigenous languages of the Americas

Frontiers in artificial intelligence, Dec 2, 2022

languages in AmericasNLI, but model adaptation via continued pretraining results in improvements.... more languages in AmericasNLI, but model adaptation via continued pretraining results in improvements. All machine translation models are rather weak, but, surprisingly, translation-based approaches to natural language inference outperform all other models on that task.

Research paper thumbnail of Marejadas Rurales y Lucha Por La Vida, Vol. I:Construcción Sociocultural y Económica Del Campo

INSTITUTO DE CIENCIAS AGROPECUARIAS Y RURALES (ICAR), UNIVERSIDAD DE GUADALAJARA, EL COLEGIO DE M... more INSTITUTO DE CIENCIAS AGROPECUARIAS Y RURALES (ICAR), UNIVERSIDAD DE GUADALAJARA, EL COLEGIO DE MICHOACAN A.C., CUCOSTA SUR GRANA, FACULTAD DE ESTUDIOS SUPERIORES ACATLAN-UNAM, ECOSUR

Research paper thumbnail of Ethical Considerations for Machine Translation of Indigenous Languages: Giving a Voice to the Speakers

Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Research paper thumbnail of AmericasNLI: Machine translation and natural language inference systems for Indigenous languages of the Americas

Frontiers in Artificial Intelligence

Little attention has been paid to the development of human language technology for truly low-reso... more Little attention has been paid to the development of human language technology for truly low-resource languages—i.e., languages with limited amounts of digitally available text data, such as Indigenous languages. However, it has been shown that pretrained multilingual models are able to perform crosslingual transfer in a zero-shot setting even for low-resource languages which are unseen during pretraining. Yet, prior work evaluating performance on unseen languages has largely been limited to shallow token-level tasks. It remains unclear if zero-shot learning of deeper semantic tasks is possible for unseen languages. To explore this question, we present AmericasNLI, a natural language inference dataset covering 10 Indigenous languages of the Americas. We conduct experiments with pretrained models, exploring zero-shot learning in combination with model adaptation. Furthermore, as AmericasNLI is a multiway parallel dataset, we use it to benchmark the performance of different machine tr...

Research paper thumbnail of Lost in Translation: Analysis of Information Loss During Machine Translation Between Polysynthetic and Fusional Languages

Proceedings of the Workshop on Computational Modeling of Polysynthetic Languages, Aug 1, 2018

Machine translation from polysynthetic to fusional languages is a challenging task, which gets fu... more Machine translation from polysynthetic to fusional languages is a challenging task, which gets further complicated by the limited amount of parallel text available. Thus, translation performance is far from the state of the art for high-resource and more intensively studied language pairs. To shed light on the phenomena which hamper automatic translation to and from polysynthetic languages, we study translations from three low-resource, polysynthetic languages (Nahuatl, Wixarika and Yorem Nokki) into Spanish and vice versa. Doing so, we find that in a morpheme-to-morpheme alignment an important amount of information contained in polysynthetic morphemes has no Spanish counterpart, and its translation is often omitted. We further conduct a qualitative analysis and, thus, identify morpheme types that are commonly hard to align or ignored in the translation process.

Research paper thumbnail of AmericasNLI: Evaluating Zero-shot Natural Language Understanding of Pretrained Multilingual Models in Truly Low-resource Languages

Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Research paper thumbnail of BPE vs. Morphological Segmentation: A Case Study on Machine Translation of Four Polysynthetic Languages

Morphologically-rich polysynthetic languages present a challenge for NLP systems due to data spar... more Morphologically-rich polysynthetic languages present a challenge for NLP systems due to data sparsity, and a common strategy to handle this issue is to apply subword segmentation. We investigate a wide variety of supervised and unsupervised morphological segmentation methods for four polysynthetic languages: Nahuatl, Raramuri, Shipibo-Konibo, and Wixarika. Then, we compare the morphologically inspired segmentation methods against Byte-Pair Encodings (BPEs) as inputs for machine translation (MT) when translating to and from Spanish. We show that for all language pairs except for Nahuatl, an unsupervised morphological segmentation algorithm outperforms BPEs consistently and that, although supervised methods achieve better segmentation scores, they under-perform in MT challenges. Finally, we contribute two new morphological segmentation datasets for Raramuri and Shipibo-Konibo, and a parallel corpus for Raramuri--Spanish.

Research paper thumbnail of Lost in Translation: Analysis of Information Loss During Machine Translation Between Polysynthetic and Fusional Languages

ArXiv, 2018

Machine translation from polysynthetic to fusional languages is a challenging task, which gets fu... more Machine translation from polysynthetic to fusional languages is a challenging task, which gets further complicated by the limited amount of parallel text available. Thus, translation performance is far from the state of the art for high-resource and more intensively studied language pairs. To shed light on the phenomena which hamper automatic translation to and from polysynthetic languages, we study translations from three low-resource, polysynthetic languages (Nahuatl, Wixarika and Yorem Nokki) into Spanish and vice versa. Doing so, we find that in a morpheme-to-morpheme alignment an important amount of information contained in polysynthetic morphemes has no Spanish counterpart, and its translation is often omitted. We further conduct a qualitative analysis and, thus, identify morpheme types that are commonly hard to align or ignored in the translation process.

Research paper thumbnail of Ethnic Consciousness in Cultural Survival: The Morongo Band of Mission Indians and the Kickapoo Traditional Tribe of Texas

This article argues that ethnic consciousness has been important for the cultural survival of Nor... more This article argues that ethnic consciousness has been important for the cultural survival of North American indigenous tribes. In comparing the Morongo Band of Mission Indians and the Kickapoo Traditional Tribe of Texas, I posit that the politics of assimilation and integration into the capitalist system reduce ethnic consciousness, leading to greater cultural loss. By contrast, the renovation of ethnic consciousness can counteract cultural assimilation, strengthen the economy, and guarantee tribal survival. The article also discusses contextual issues of territory and reservation casinos.

Research paper thumbnail of Transpersonale Ergahrung in Novalis "Hymnen an die Nacht