Richard Hatcher Jr - Academia.edu (original) (raw)

Uploads

Papers by Richard Hatcher Jr

Research paper thumbnail of Speech technology for supporting community-based endangered language documentation

We are grateful for the support and generosity of the elders of the Seneca Nation of Indians. Lin... more We are grateful for the support and generosity of the elders of the Seneca Nation of Indians. Linnea undergrad linguistics RA “I had a difficult time with the ASR, because I spent more time crosschecking the transcription than actually just transcribing.” Julia undergrad linguistics RA “Using ASR, I was able to focus on comparing the audio to the transcription rather than trying to perceive what was being said.”

Research paper thumbnail of UniMorph 4.0: Universal Morphology

arXiv (Cornell University), May 7, 2022

The Universal Morphology (UniMorph) project is a collaborative effort providing broad-coverage in... more The Universal Morphology (UniMorph) project is a collaborative effort providing broad-coverage instantiated normalized morphological paradigms for hundreds of diverse world languages. The project comprises two major thrusts: a language-independent feature schema for rich morphological annotation and a type-level resource of annotated data in diverse languages realizing that schema. We have implemented several improvements to the extraction pipeline which creates most of our data, so that it is both more complete and more correct. We have added 66 new languages, as well as new parts of speech for 12 languages. We have also amended the schema in several ways. Finally, we present three new community tools: two to validate data for resource creators, and one to make morphological data available from the command line. UniMorph is based at the Center for Language and Speech Processing (CLSP) at Johns Hopkins University in Baltimore, Maryland. This paper details advances made to the schema, tooling, and dissemination of project resources since the UniMorph 2.0 release described at LREC 2018.

Research paper thumbnail of A Discourse Model for “Undirected Speculation”

Fresh Perspectives on Major Issues in Pragmatics, 2020

We identify Undirected Speculation as a category of available discourse moves in English dialogue... more We identify Undirected Speculation as a category of available discourse moves in English dialogue. In an Undirected Speculation Utterance (USU), a speaker offers a question or inquisitive declarative – frequently of the form <I wonder Q> – without assuming that the hearer will acknowledge and potentially respond to this move. Unlike conventional interrogatives, USUs can be felicitously ignored by the hearer and all previously available conversational moves remain licit. We contrast USUs with discourse moves such as assertions and questions in terms of the effect of each discourse move on the dialogue and the felicitous response possibilities. We use Condoravdi and Lauer's contextual conditions framework (2009) to derive the discourse effects of Undirected Speculation from the truth-conditional semantics of the sentence <I wonder Q>. Our analysis advances the conversational scoreboard or dialogue gameboard tradition of dialogue modelling (Lewis 1979; Roberts 2012; Farkas & Bruce 2010; Malamud & Stephenson 2015). We argue that the inquisitive component of a USU may have essentially the same denotation as a question, but affects the gameboard in a distinct way: the hearer is not obligated to address the question and therefore it cannot be the case that a USU is placed directly on the Table. In our revised model, the speaker uttering a polar USU is proposing two possible future gameboard states ({<(p?¬p),…>,<…>}). In the first, the hearer responds to the question and so the inquisitive component of the USU is placed on the Table, and in the alternative, the hearer ignores the USU (e.g. by uttering something which is not a response to it) – thus the Table remains in its current (potentially empty) state. Upon the next discourse move, one of these projected states becomes the current state of the Table (according to the reaction of the hearer). We believe that Undirected Speculation demonstrates the need to develop richer mechanisms for capturing tentativity in dialogue and provides evidence for the incorporation of possible future states of the Table into the dialogue gameboard model. Keywords: dialogue gameboard, discourse commitment, preference structure, question under discussion, tentativity

Research paper thumbnail of A Description of Korean Converbs and their Northeast Asian context

This paper is a study of Korean converbs in relation to the neighboring languages of Northeast As... more This paper is a study of Korean converbs in relation to the neighboring languages of Northeast Asia. It is based on the descriptive and theoretical advances made in modern linguistics and provides an extensive analysis of the key issues regarding converbs in Korean as well as ten other neighboring languages. The thesis consists of five sections including an introduction and conclusion. The introduction consists of a background of the converb and its use in typological settings. Chapter 1 is subdivided into three parts, the first of which describes exactly what forms can be considered converbs in Korean and relates them to their traditional description in Korean grammar. This is followed by a description of Korean converbs categorized into those containing or lacking temporal values. Chapter 1 concludes with a short section regarding the origins and evolutionary paths of converbs. Chapter 2 is a comparison of the Korea data above with converbs in neighboring languages. These languages can be roughly grouped into two sets, the Transeurasian languages and the Protoasiatic languages, neither a language family in the traditional sense. Chapter 3 consists of a discussion on several trends within the converbal systems on Northeast Asian languages. The paper ends with a short conclusion. The Korean data is not atypical for the region. While it is true that Korean has a larger than average number of converbs, this is not unique to Korean for Nivkh contains nearly the same number. Korean’s lack of different-subject converbs and the paucity of posterior converbs is the overall trend for the region. It is also common for a language to have one highly contextual converb with a wide variety of possible interpretations. Keywords: Korean, Converbs, Northeast Asia, Transeurasian, Paleoasiatic, Linguistic Typology

Research paper thumbnail of On the non-universality of intonation: Evidence from Triqui

The Journal of the Acoustical Society of America, 2018

Languages with large lexical tone inventories typically involve less freedom for suprasegmental p... more Languages with large lexical tone inventories typically involve less freedom for suprasegmental properties to be manipulated for pragmatic meaning or phrasal constituency (Connell 2017). However, such languages may still use F0 to a limited degree for marking information structure or utterance finality (DiCanio et al. 2018, Xu 1999). We present results from three field experiments with 11 speakers where we investigated information structure and prosody in Itunyoso Triqui, an Otomanguean language (Mexico). Itunyoso Triqui possesses nine lexical tones (/4, 3, 2, 1, 43, 32, 31, 13, 45/), fixed final stress, and contrastive phonation type. In experiment 1, we examined tone production in words in broad and narrow focus contexts. Words under narrow focus were lengthened slightly (13-14%) but no general effect of focus on F0 levels or contours was found. In experiment 2, we examined tones in utterance non-final and final contexts. Words were lengthened in utterance-final position relative to non-final position, but no F0 differences were found. In experiment 3, we investigated F0 declination in sentences consisting of only level tones and found no F0 change across utterances. The results from these experiments suggest that Itunyoso Triqui does not use F0 to encode information structure or prosodic boundaries.Languages with large lexical tone inventories typically involve less freedom for suprasegmental properties to be manipulated for pragmatic meaning or phrasal constituency (Connell 2017). However, such languages may still use F0 to a limited degree for marking information structure or utterance finality (DiCanio et al. 2018, Xu 1999). We present results from three field experiments with 11 speakers where we investigated information structure and prosody in Itunyoso Triqui, an Otomanguean language (Mexico). Itunyoso Triqui possesses nine lexical tones (/4, 3, 2, 1, 43, 32, 31, 13, 45/), fixed final stress, and contrastive phonation type. In experiment 1, we examined tone production in words in broad and narrow focus contexts. Words under narrow focus were lengthened slightly (13-14%) but no general effect of focus on F0 levels or contours was found. In experiment 2, we examined tones in utterance non-final and final contexts. Words were lengthened in utterance-final position relative to non-final position, ...

Research paper thumbnail of Cayuga word melodies—A corpus phonetic study

The Journal of the Acoustical Society of America, 2020

Research paper thumbnail of SIGMORPHON 2021 Shared Task on Morphological Reinflection: Generalization Across Languages

Proceedings of the 18th SIGMORPHON Workshop on Computational Research in Phonetics, Phonology, and Morphology, 2021

This year's iteration of the SIGMORPHON Shared Task on morphological reinflection focuses on typo... more This year's iteration of the SIGMORPHON Shared Task on morphological reinflection focuses on typological diversity and crosslingual variation of morphosyntactic features. In terms of the task, we enrich UniMorph with new data for 32 languages from 13 language families, with most of them being under-resourced: Kunwinjku

Research paper thumbnail of Not always about you: Prioritizing community needs when developing endangered language technology

Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Languages are classified as low-resource when they lack the quantity of data necessary for traini... more Languages are classified as low-resource when they lack the quantity of data necessary for training statistical and machine learning tools and models. Causes of resource scarcity vary but can include poor access to technology for developing these resources, a relatively small population of speakers, or a lack of urgency for collecting such resources in bilingual populations where the second language is highresource. As a result, the languages described as low-resource in the literature are as different as Finnish on the one hand, with millions of speakers using it in every imaginable domain, and Seneca, with only a small-handful of fluent speakers using the language primarily in a restricted domain. While issues stemming from the lack of resources necessary to train models unite this disparate group of languages, many other issues cut across the divide between widely-spoken low-resource languages and endangered languages. In this position paper, we discuss the unique technological, cultural, practical, and ethical challenges that researchers and indigenous speech community members face when working together to develop language technology to support endangered language documentation and revitalization. We report the perspectives of language teachers, Master Speakers and elders from indigenous communities, as well as the point of view of academics. We describe an ongoing fruitful collaboration and make recommendations for future partnerships between academic researchers and language community stakeholders.

Research paper thumbnail of Speech technology for supporting community-based endangered language documentation

We are grateful for the support and generosity of the elders of the Seneca Nation of Indians. Lin... more We are grateful for the support and generosity of the elders of the Seneca Nation of Indians. Linnea undergrad linguistics RA “I had a difficult time with the ASR, because I spent more time crosschecking the transcription than actually just transcribing.” Julia undergrad linguistics RA “Using ASR, I was able to focus on comparing the audio to the transcription rather than trying to perceive what was being said.”

Research paper thumbnail of UniMorph 4.0: Universal Morphology

arXiv (Cornell University), May 7, 2022

The Universal Morphology (UniMorph) project is a collaborative effort providing broad-coverage in... more The Universal Morphology (UniMorph) project is a collaborative effort providing broad-coverage instantiated normalized morphological paradigms for hundreds of diverse world languages. The project comprises two major thrusts: a language-independent feature schema for rich morphological annotation and a type-level resource of annotated data in diverse languages realizing that schema. We have implemented several improvements to the extraction pipeline which creates most of our data, so that it is both more complete and more correct. We have added 66 new languages, as well as new parts of speech for 12 languages. We have also amended the schema in several ways. Finally, we present three new community tools: two to validate data for resource creators, and one to make morphological data available from the command line. UniMorph is based at the Center for Language and Speech Processing (CLSP) at Johns Hopkins University in Baltimore, Maryland. This paper details advances made to the schema, tooling, and dissemination of project resources since the UniMorph 2.0 release described at LREC 2018.

Research paper thumbnail of A Discourse Model for “Undirected Speculation”

Fresh Perspectives on Major Issues in Pragmatics, 2020

We identify Undirected Speculation as a category of available discourse moves in English dialogue... more We identify Undirected Speculation as a category of available discourse moves in English dialogue. In an Undirected Speculation Utterance (USU), a speaker offers a question or inquisitive declarative – frequently of the form <I wonder Q> – without assuming that the hearer will acknowledge and potentially respond to this move. Unlike conventional interrogatives, USUs can be felicitously ignored by the hearer and all previously available conversational moves remain licit. We contrast USUs with discourse moves such as assertions and questions in terms of the effect of each discourse move on the dialogue and the felicitous response possibilities. We use Condoravdi and Lauer's contextual conditions framework (2009) to derive the discourse effects of Undirected Speculation from the truth-conditional semantics of the sentence <I wonder Q>. Our analysis advances the conversational scoreboard or dialogue gameboard tradition of dialogue modelling (Lewis 1979; Roberts 2012; Farkas & Bruce 2010; Malamud & Stephenson 2015). We argue that the inquisitive component of a USU may have essentially the same denotation as a question, but affects the gameboard in a distinct way: the hearer is not obligated to address the question and therefore it cannot be the case that a USU is placed directly on the Table. In our revised model, the speaker uttering a polar USU is proposing two possible future gameboard states ({<(p?¬p),…>,<…>}). In the first, the hearer responds to the question and so the inquisitive component of the USU is placed on the Table, and in the alternative, the hearer ignores the USU (e.g. by uttering something which is not a response to it) – thus the Table remains in its current (potentially empty) state. Upon the next discourse move, one of these projected states becomes the current state of the Table (according to the reaction of the hearer). We believe that Undirected Speculation demonstrates the need to develop richer mechanisms for capturing tentativity in dialogue and provides evidence for the incorporation of possible future states of the Table into the dialogue gameboard model. Keywords: dialogue gameboard, discourse commitment, preference structure, question under discussion, tentativity

Research paper thumbnail of A Description of Korean Converbs and their Northeast Asian context

This paper is a study of Korean converbs in relation to the neighboring languages of Northeast As... more This paper is a study of Korean converbs in relation to the neighboring languages of Northeast Asia. It is based on the descriptive and theoretical advances made in modern linguistics and provides an extensive analysis of the key issues regarding converbs in Korean as well as ten other neighboring languages. The thesis consists of five sections including an introduction and conclusion. The introduction consists of a background of the converb and its use in typological settings. Chapter 1 is subdivided into three parts, the first of which describes exactly what forms can be considered converbs in Korean and relates them to their traditional description in Korean grammar. This is followed by a description of Korean converbs categorized into those containing or lacking temporal values. Chapter 1 concludes with a short section regarding the origins and evolutionary paths of converbs. Chapter 2 is a comparison of the Korea data above with converbs in neighboring languages. These languages can be roughly grouped into two sets, the Transeurasian languages and the Protoasiatic languages, neither a language family in the traditional sense. Chapter 3 consists of a discussion on several trends within the converbal systems on Northeast Asian languages. The paper ends with a short conclusion. The Korean data is not atypical for the region. While it is true that Korean has a larger than average number of converbs, this is not unique to Korean for Nivkh contains nearly the same number. Korean’s lack of different-subject converbs and the paucity of posterior converbs is the overall trend for the region. It is also common for a language to have one highly contextual converb with a wide variety of possible interpretations. Keywords: Korean, Converbs, Northeast Asia, Transeurasian, Paleoasiatic, Linguistic Typology

Research paper thumbnail of On the non-universality of intonation: Evidence from Triqui

The Journal of the Acoustical Society of America, 2018

Languages with large lexical tone inventories typically involve less freedom for suprasegmental p... more Languages with large lexical tone inventories typically involve less freedom for suprasegmental properties to be manipulated for pragmatic meaning or phrasal constituency (Connell 2017). However, such languages may still use F0 to a limited degree for marking information structure or utterance finality (DiCanio et al. 2018, Xu 1999). We present results from three field experiments with 11 speakers where we investigated information structure and prosody in Itunyoso Triqui, an Otomanguean language (Mexico). Itunyoso Triqui possesses nine lexical tones (/4, 3, 2, 1, 43, 32, 31, 13, 45/), fixed final stress, and contrastive phonation type. In experiment 1, we examined tone production in words in broad and narrow focus contexts. Words under narrow focus were lengthened slightly (13-14%) but no general effect of focus on F0 levels or contours was found. In experiment 2, we examined tones in utterance non-final and final contexts. Words were lengthened in utterance-final position relative to non-final position, but no F0 differences were found. In experiment 3, we investigated F0 declination in sentences consisting of only level tones and found no F0 change across utterances. The results from these experiments suggest that Itunyoso Triqui does not use F0 to encode information structure or prosodic boundaries.Languages with large lexical tone inventories typically involve less freedom for suprasegmental properties to be manipulated for pragmatic meaning or phrasal constituency (Connell 2017). However, such languages may still use F0 to a limited degree for marking information structure or utterance finality (DiCanio et al. 2018, Xu 1999). We present results from three field experiments with 11 speakers where we investigated information structure and prosody in Itunyoso Triqui, an Otomanguean language (Mexico). Itunyoso Triqui possesses nine lexical tones (/4, 3, 2, 1, 43, 32, 31, 13, 45/), fixed final stress, and contrastive phonation type. In experiment 1, we examined tone production in words in broad and narrow focus contexts. Words under narrow focus were lengthened slightly (13-14%) but no general effect of focus on F0 levels or contours was found. In experiment 2, we examined tones in utterance non-final and final contexts. Words were lengthened in utterance-final position relative to non-final position, ...

Research paper thumbnail of Cayuga word melodies—A corpus phonetic study

The Journal of the Acoustical Society of America, 2020

Research paper thumbnail of SIGMORPHON 2021 Shared Task on Morphological Reinflection: Generalization Across Languages

Proceedings of the 18th SIGMORPHON Workshop on Computational Research in Phonetics, Phonology, and Morphology, 2021

This year's iteration of the SIGMORPHON Shared Task on morphological reinflection focuses on typo... more This year's iteration of the SIGMORPHON Shared Task on morphological reinflection focuses on typological diversity and crosslingual variation of morphosyntactic features. In terms of the task, we enrich UniMorph with new data for 32 languages from 13 language families, with most of them being under-resourced: Kunwinjku

Research paper thumbnail of Not always about you: Prioritizing community needs when developing endangered language technology

Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Languages are classified as low-resource when they lack the quantity of data necessary for traini... more Languages are classified as low-resource when they lack the quantity of data necessary for training statistical and machine learning tools and models. Causes of resource scarcity vary but can include poor access to technology for developing these resources, a relatively small population of speakers, or a lack of urgency for collecting such resources in bilingual populations where the second language is highresource. As a result, the languages described as low-resource in the literature are as different as Finnish on the one hand, with millions of speakers using it in every imaginable domain, and Seneca, with only a small-handful of fluent speakers using the language primarily in a restricted domain. While issues stemming from the lack of resources necessary to train models unite this disparate group of languages, many other issues cut across the divide between widely-spoken low-resource languages and endangered languages. In this position paper, we discuss the unique technological, cultural, practical, and ethical challenges that researchers and indigenous speech community members face when working together to develop language technology to support endangered language documentation and revitalization. We report the perspectives of language teachers, Master Speakers and elders from indigenous communities, as well as the point of view of academics. We describe an ongoing fruitful collaboration and make recommendations for future partnerships between academic researchers and language community stakeholders.