Heather Simpson | University of California, Santa Barbara (original) (raw)

Uploads

Papers by Heather Simpson

Research paper thumbnail of Memory Capacity Limits in Processing of Natural Connected Speech: The Psychological Reality of Intonation Units

Proceedings of the 37th Annual Conference of the Cognitive Science Society, Pasadena, CA, 2015

Many theories of memory propose some type of short-term store limited in capacity to a small numb... more Many theories of memory propose some type of short-term store limited in capacity to a small number of information chunks. However, although short-term verbal memory is generally considered to be a crucial component of language processing, the relevant information chunk level that may de fine capacity limits in
ecologically-valid spoken language has never been investigated. The Intonation Unit (IU), an intermediate-level prosodic phrase, has been theorized to be a fundamental unit of spoken language, the focus of a speaker's mental processing. This suggests that IUs might play a role as the relevant unit representing "chunks" of spoken language. We report the results of an experiment investigating the role of IUs in short-term memory in a serial recall task. We found a signi ficant non-linear effect of stimulus size in IUs, but not clauses. We conclude that Intonation Units are the primary linguistic unit used for chunking spoken language input in memory.

Research paper thumbnail of Structural, social, and cognitive factors driving adjective order in Korean: a multifactorial corpus analysis (under review)

Multi-factorial statistical analysis of language corpora is an invaluable tool for revealing the ... more Multi-factorial statistical analysis of language corpora is an invaluable tool for revealing the complex interactions of the structural, socio-pragmatic, and cognitive processing factors driving linguistic form. This paper reports the results of a multifactorial corpus analysis of Korean attributive adjective ordering preferences in adjectiveadjective-noun sequences. Attributive adjective sequences can be produced in one of two structures: a coordinated construction or a non-coordinated construction. Seven adjective properties adapted from the English adjective order study by were included in a regression model along with construction type. The results show that six of the seven properties investigated have significantly different effects on adjective order between the two constructions. The findings are explained with reference to the relationship of linguistic structure and the distinct motivations of iconicity and ease of processing.

Research paper thumbnail of Acoustic and Phonological Correlates of Korean Perception of Japanese Alveolar Fricatives

Research paper thumbnail of Korean perception of the Japanese voiceless alveolar fricative /s

Proceedings of the Acoustical Society of America, 2011

Research paper thumbnail of Wikipedia and the Web of Confusable Entities: Experience from Entity Linking Query Creation for TAC 2009 Knowledge Base Population

... In addition to logging name variants to match in the source data, entity profiles served as a... more ... In addition to logging name variants to match in the source data, entity profiles served as a connecting thread to uniquely identify entities in initial candidate selection,expansion through Wikipedia exploration, and corpus exploration. ...

Research paper thumbnail of The DARPA Machine Reading Program - Encouraging Linguistic and Reasoning Research with a Series of Reading Tasks

The goal of DARPA's Machine Reading (MR) program is nothing less than making the world'... more The goal of DARPA's Machine Reading (MR) program is nothing less than making the world's natural language corpora available for formal processing. Most text processing research has focused on locating mission-relevant text (information retrieval) and on tech-niques for ...

Research paper thumbnail of Basic Language Resources for Diverse Asian Languages: A Streamlined Approach for Resource Creation

The REFLEX-LCTL (Research on English and Foreign Language Exploitation-Less Commonly Taught Langu... more The REFLEX-LCTL (Research on English and Foreign Language Exploitation-Less Commonly Taught Languages) program, sponsored by the United States government, was an effort in simultaneous creation of basic language resources and technologies for under-resourced languages, with the aim to enrich sparse areas in language technology resources and encourage new research. We were tasked to produce basic language resources for 8 Asian languages: Bengali, Pashto, Punjabi, Tamil, Tagalog, Thai, Urdu and Uzbek, and 5 languages from Europe and Africa, and distribute them to research and development also funded by the program. This paper will discuss the streamlined approach to language resource development we designed to support simultaneous creation of multiple resources for multiple languages.

Research paper thumbnail of An Evaluation of Technologies for Knowledge Base Population

Conference Presentations by Heather Simpson

Research paper thumbnail of Memory Capacity Limits in Processing of Natural Connected Speech: The Psychological Reality of Intonation Units

Objectives • We investigate the hypothesis that prosodic grouping into Intonation Units (IUs) reg... more Objectives • We investigate the hypothesis that prosodic grouping into Intonation Units (IUs) regulates the contents of short-term memory in naturally-produced spoken language • This effect should be observable as a discontinuity in recall performance at some small number of IUs.

Research paper thumbnail of Remembering for Speaking: Short-term memory capacity in processing of connected speech

This talk provides a very basic overview of my dissertation research on the role of memory in pro... more This talk provides a very basic overview of my dissertation research on the role of memory in processing of natural spoken language (e.g. conversational speech), as well as some initial findings. The results of a multi-factorial analysis of serial word recall performance as predicted by length of a stimulus in clauses, words, and prosodic phrases (using the Intonation Unit formulation of prosodic phrase) are presented.

Research paper thumbnail of Green big plant or big green plant?: Adjective order preferences in Korean

This study examines the factors determining the relative ordering of Korean attributive adjective... more This study examines the factors determining the relative ordering of Korean attributive adjectives, using 1810 adjective-adjective-noun sequences taken from the Korea Advanced Institute of Science & Technology (KAIST) corpus (Chae & Choi 2000). Most of the prior literature on adjective order has been based on English, or on very small datasets from other languages. This will be the first multi-factorial corpus analysis of ordering preferences of Korean adjectives.
The adjective order factors examined are based on Wulff (2003), the first multi-factorial corpus analysis study of adjective order preferences for English. These factors represent claims about adjective order in the prior literature that Wulff (2003) operationalized into variables that are assessable using corpus analysis techniques. These factors include the frequency of the adjective, the length of the adjective root, the subjectivity of the quality represented by the adjective (based on the categories defined in Hetzron 1978), and its collocation frequency with the noun.
Korean adjectives, unlike English adjectives, have many features in common with Korean verbs. They are directly inflected with tense and aspect, do not combine with a copula in predicative constructions, and they can take arguments. When combined in a multi-adjective sequence, each adjective must be inflected with either a modifier suffix (은 –ŭn ) or a conjunctive suffix (e.g., -고 –ko ‘and’). The suffix type was included as an additional variable in the model.
The results of a binary logistic regression showed that all the factors investigated had significant effects on adjective order. However, the effects were not always in the direction predicted by previous literature, mainly due to the effect of suffix type. Suffix type was found to strongly affect ordering preferences, and changed the direction of the effect for five of the seven significant factors. The effects which do not follow the predicted direction for coordinated sequences are all related to the strength of the association of the adjective to the noun. The effects which do not follow the predicted direction for non-coordinated sequences are both related to general cognitive processing: frequency and length. The explanation for this result may be found in the structure of the two sequence types, namely that for coordinated sequences, the two adjectives are treated as a single unit, but in non-coordinated sequences, the noun and its closest adjective are treated as a subunit in the sequence, with the first adjective treated as a modifier of that adjective-noun unit. Thus the association of the adjectives and the noun may be expected to be less important for adjective order in coordinated sequences than in non-coordinated sequences.

References:
Hetzron, R. (1978). On the Relative Order of Adjectives. In H. Seiler (Ed.), Language
Universals (pp. 165-184). Tübingen: Narr.
Wulff, S. (2003). A multifactorial corpus analysis of adjective order in English. International Journal of Corpus Linguistics, 8, 245-282.

Research paper thumbnail of Perceptual mapping of voiceless alveolar fricatives from Japanese to Korean

Korean voiceless alveolar fricatives exhibit a relatively unique phonemic contrast based on laryn... more Korean voiceless alveolar fricatives exhibit a relatively unique phonemic contrast based on laryngeal activity, commonly referred to as tense /s*/ and lax /s/ or fortis and non-fortis /s/. Patterns in loanword adaption of English /s/ have been used to infer the phonological basis for the mapping of English /s/ to Korean, however perception studies have shown that loanword data obscure patterns in the perception of English /s/ by Korean native speakers, such as effect of vowel environment (Schmidt 1996). Mapping from Japanese /s/ to Korean has been much less studied, but according to a loanword study by Ito et al. (2006), Japanese /s/ in word-initial and medial position is consistently mapped to Korean lax /s/, and Japanese geminate /ss/ is consistently mapped to Korean tense /s*/, implying phonological mapping based on the source language categories along the lines of LaCharite and Paradis’ (2003, 2005) traditional phonological grammar approach. The current study investigates whether the findings of Ito et al. (2006) for Korean adaptation of Japanese voiceless alveolar fricatives are supported by the perception of Korean native speakers, and adds to previous analyses by investigating the dimension of Japanese lexical pitch accent. A perception study was conducted using 32 bi-syllabic test tokens containing either Japanese single /s/ or geminate /ss/, and 32 bi-syllabic filler tokens, spoken in a carrier phrase by a 20-year-old native speaker of Tokyo Japanese. All tokens were presented twice in random order so that consistency of judgments could be analyzed. Twenty-two Korean native speakers participated in the experiment. Participants listened to stimuli over headphones and selected one of two orthographic representations of the word that they heard, which for test stimuli differed only in whether they contained the character representing lax /s/ (ㅅ) or tense /s*/ (ㅆ). A linear mixed-effects model was used to evaluate thirteen acoustic and six phonological factors as predictors of response choice. The results show that the duration of the Japanese /s/ seems to be the most highly significant factor. However, though the distribution of the durations of Japanese /s/ and /ss/ are clearly distinct, the Japanese phonological category was a less effective predictor than acoustic measurements of duration. Additionally, many stimuli were found to have fairly high percentages of inconsistent responses. The results can be taken to support a view that Korean native speaker mapping of Japanese fricatives is based on gradient perceptual similarity to native categories, which may be determined by multiple cues.

Research paper thumbnail of Korean perception of the Japanese voiceless alveolar fricative /s/

Talks by Heather Simpson

Research paper thumbnail of Intonation Units chunk spoken language: Evidence from association strength in recall of spoken English

Short-term memory capacity is mediated by the process of 'chunking', where lower-level units of i... more Short-term memory capacity is mediated by the process of 'chunking', where lower-level units of information that are strongly associated, seem to take up less memory 'space' than those which are more weakly associated. We can expect increased association strength in memory between words that are part of the same `chunk' of information in the focus of attention. In a recall task, this increased association strength could result in an increased likelihood that if word A is remembered (or forgotten), word B will also be remembered (or forgotten). I discuss the results of a verbatim recall experiment on data from the Santa Barbara Corpus of Spoken American English, testing the effect of Intonation Unit and clause boundaries on the likelihood that a pair of consecutive words (i.e. a bigram) share the same recall status.

Research paper thumbnail of Memory Capacity Limits in Processing of Natural Connected Speech: The Psychological Reality of Intonation Units

Proceedings of the 37th Annual Conference of the Cognitive Science Society, Pasadena, CA, 2015

Many theories of memory propose some type of short-term store limited in capacity to a small numb... more Many theories of memory propose some type of short-term store limited in capacity to a small number of information chunks. However, although short-term verbal memory is generally considered to be a crucial component of language processing, the relevant information chunk level that may de fine capacity limits in
ecologically-valid spoken language has never been investigated. The Intonation Unit (IU), an intermediate-level prosodic phrase, has been theorized to be a fundamental unit of spoken language, the focus of a speaker's mental processing. This suggests that IUs might play a role as the relevant unit representing "chunks" of spoken language. We report the results of an experiment investigating the role of IUs in short-term memory in a serial recall task. We found a signi ficant non-linear effect of stimulus size in IUs, but not clauses. We conclude that Intonation Units are the primary linguistic unit used for chunking spoken language input in memory.

Research paper thumbnail of Structural, social, and cognitive factors driving adjective order in Korean: a multifactorial corpus analysis (under review)

Multi-factorial statistical analysis of language corpora is an invaluable tool for revealing the ... more Multi-factorial statistical analysis of language corpora is an invaluable tool for revealing the complex interactions of the structural, socio-pragmatic, and cognitive processing factors driving linguistic form. This paper reports the results of a multifactorial corpus analysis of Korean attributive adjective ordering preferences in adjectiveadjective-noun sequences. Attributive adjective sequences can be produced in one of two structures: a coordinated construction or a non-coordinated construction. Seven adjective properties adapted from the English adjective order study by were included in a regression model along with construction type. The results show that six of the seven properties investigated have significantly different effects on adjective order between the two constructions. The findings are explained with reference to the relationship of linguistic structure and the distinct motivations of iconicity and ease of processing.

Research paper thumbnail of Acoustic and Phonological Correlates of Korean Perception of Japanese Alveolar Fricatives

Research paper thumbnail of Korean perception of the Japanese voiceless alveolar fricative /s

Proceedings of the Acoustical Society of America, 2011

Research paper thumbnail of Wikipedia and the Web of Confusable Entities: Experience from Entity Linking Query Creation for TAC 2009 Knowledge Base Population

... In addition to logging name variants to match in the source data, entity profiles served as a... more ... In addition to logging name variants to match in the source data, entity profiles served as a connecting thread to uniquely identify entities in initial candidate selection,expansion through Wikipedia exploration, and corpus exploration. ...

Research paper thumbnail of The DARPA Machine Reading Program - Encouraging Linguistic and Reasoning Research with a Series of Reading Tasks

The goal of DARPA's Machine Reading (MR) program is nothing less than making the world'... more The goal of DARPA's Machine Reading (MR) program is nothing less than making the world's natural language corpora available for formal processing. Most text processing research has focused on locating mission-relevant text (information retrieval) and on tech-niques for ...

Research paper thumbnail of Basic Language Resources for Diverse Asian Languages: A Streamlined Approach for Resource Creation

The REFLEX-LCTL (Research on English and Foreign Language Exploitation-Less Commonly Taught Langu... more The REFLEX-LCTL (Research on English and Foreign Language Exploitation-Less Commonly Taught Languages) program, sponsored by the United States government, was an effort in simultaneous creation of basic language resources and technologies for under-resourced languages, with the aim to enrich sparse areas in language technology resources and encourage new research. We were tasked to produce basic language resources for 8 Asian languages: Bengali, Pashto, Punjabi, Tamil, Tagalog, Thai, Urdu and Uzbek, and 5 languages from Europe and Africa, and distribute them to research and development also funded by the program. This paper will discuss the streamlined approach to language resource development we designed to support simultaneous creation of multiple resources for multiple languages.

Research paper thumbnail of An Evaluation of Technologies for Knowledge Base Population

Research paper thumbnail of Memory Capacity Limits in Processing of Natural Connected Speech: The Psychological Reality of Intonation Units

Objectives • We investigate the hypothesis that prosodic grouping into Intonation Units (IUs) reg... more Objectives • We investigate the hypothesis that prosodic grouping into Intonation Units (IUs) regulates the contents of short-term memory in naturally-produced spoken language • This effect should be observable as a discontinuity in recall performance at some small number of IUs.

Research paper thumbnail of Remembering for Speaking: Short-term memory capacity in processing of connected speech

This talk provides a very basic overview of my dissertation research on the role of memory in pro... more This talk provides a very basic overview of my dissertation research on the role of memory in processing of natural spoken language (e.g. conversational speech), as well as some initial findings. The results of a multi-factorial analysis of serial word recall performance as predicted by length of a stimulus in clauses, words, and prosodic phrases (using the Intonation Unit formulation of prosodic phrase) are presented.

Research paper thumbnail of Green big plant or big green plant?: Adjective order preferences in Korean

This study examines the factors determining the relative ordering of Korean attributive adjective... more This study examines the factors determining the relative ordering of Korean attributive adjectives, using 1810 adjective-adjective-noun sequences taken from the Korea Advanced Institute of Science & Technology (KAIST) corpus (Chae & Choi 2000). Most of the prior literature on adjective order has been based on English, or on very small datasets from other languages. This will be the first multi-factorial corpus analysis of ordering preferences of Korean adjectives.
The adjective order factors examined are based on Wulff (2003), the first multi-factorial corpus analysis study of adjective order preferences for English. These factors represent claims about adjective order in the prior literature that Wulff (2003) operationalized into variables that are assessable using corpus analysis techniques. These factors include the frequency of the adjective, the length of the adjective root, the subjectivity of the quality represented by the adjective (based on the categories defined in Hetzron 1978), and its collocation frequency with the noun.
Korean adjectives, unlike English adjectives, have many features in common with Korean verbs. They are directly inflected with tense and aspect, do not combine with a copula in predicative constructions, and they can take arguments. When combined in a multi-adjective sequence, each adjective must be inflected with either a modifier suffix (은 –ŭn ) or a conjunctive suffix (e.g., -고 –ko ‘and’). The suffix type was included as an additional variable in the model.
The results of a binary logistic regression showed that all the factors investigated had significant effects on adjective order. However, the effects were not always in the direction predicted by previous literature, mainly due to the effect of suffix type. Suffix type was found to strongly affect ordering preferences, and changed the direction of the effect for five of the seven significant factors. The effects which do not follow the predicted direction for coordinated sequences are all related to the strength of the association of the adjective to the noun. The effects which do not follow the predicted direction for non-coordinated sequences are both related to general cognitive processing: frequency and length. The explanation for this result may be found in the structure of the two sequence types, namely that for coordinated sequences, the two adjectives are treated as a single unit, but in non-coordinated sequences, the noun and its closest adjective are treated as a subunit in the sequence, with the first adjective treated as a modifier of that adjective-noun unit. Thus the association of the adjectives and the noun may be expected to be less important for adjective order in coordinated sequences than in non-coordinated sequences.

References:
Hetzron, R. (1978). On the Relative Order of Adjectives. In H. Seiler (Ed.), Language
Universals (pp. 165-184). Tübingen: Narr.
Wulff, S. (2003). A multifactorial corpus analysis of adjective order in English. International Journal of Corpus Linguistics, 8, 245-282.

Research paper thumbnail of Perceptual mapping of voiceless alveolar fricatives from Japanese to Korean

Korean voiceless alveolar fricatives exhibit a relatively unique phonemic contrast based on laryn... more Korean voiceless alveolar fricatives exhibit a relatively unique phonemic contrast based on laryngeal activity, commonly referred to as tense /s*/ and lax /s/ or fortis and non-fortis /s/. Patterns in loanword adaption of English /s/ have been used to infer the phonological basis for the mapping of English /s/ to Korean, however perception studies have shown that loanword data obscure patterns in the perception of English /s/ by Korean native speakers, such as effect of vowel environment (Schmidt 1996). Mapping from Japanese /s/ to Korean has been much less studied, but according to a loanword study by Ito et al. (2006), Japanese /s/ in word-initial and medial position is consistently mapped to Korean lax /s/, and Japanese geminate /ss/ is consistently mapped to Korean tense /s*/, implying phonological mapping based on the source language categories along the lines of LaCharite and Paradis’ (2003, 2005) traditional phonological grammar approach. The current study investigates whether the findings of Ito et al. (2006) for Korean adaptation of Japanese voiceless alveolar fricatives are supported by the perception of Korean native speakers, and adds to previous analyses by investigating the dimension of Japanese lexical pitch accent. A perception study was conducted using 32 bi-syllabic test tokens containing either Japanese single /s/ or geminate /ss/, and 32 bi-syllabic filler tokens, spoken in a carrier phrase by a 20-year-old native speaker of Tokyo Japanese. All tokens were presented twice in random order so that consistency of judgments could be analyzed. Twenty-two Korean native speakers participated in the experiment. Participants listened to stimuli over headphones and selected one of two orthographic representations of the word that they heard, which for test stimuli differed only in whether they contained the character representing lax /s/ (ㅅ) or tense /s*/ (ㅆ). A linear mixed-effects model was used to evaluate thirteen acoustic and six phonological factors as predictors of response choice. The results show that the duration of the Japanese /s/ seems to be the most highly significant factor. However, though the distribution of the durations of Japanese /s/ and /ss/ are clearly distinct, the Japanese phonological category was a less effective predictor than acoustic measurements of duration. Additionally, many stimuli were found to have fairly high percentages of inconsistent responses. The results can be taken to support a view that Korean native speaker mapping of Japanese fricatives is based on gradient perceptual similarity to native categories, which may be determined by multiple cues.

Research paper thumbnail of Korean perception of the Japanese voiceless alveolar fricative /s/

Research paper thumbnail of Intonation Units chunk spoken language: Evidence from association strength in recall of spoken English

Short-term memory capacity is mediated by the process of 'chunking', where lower-level units of i... more Short-term memory capacity is mediated by the process of 'chunking', where lower-level units of information that are strongly associated, seem to take up less memory 'space' than those which are more weakly associated. We can expect increased association strength in memory between words that are part of the same `chunk' of information in the focus of attention. In a recall task, this increased association strength could result in an increased likelihood that if word A is remembered (or forgotten), word B will also be remembered (or forgotten). I discuss the results of a verbatim recall experiment on data from the Santa Barbara Corpus of Spoken American English, testing the effect of Intonation Unit and clause boundaries on the likelihood that a pair of consecutive words (i.e. a bigram) share the same recall status.