Brett Hashimoto - Academia.edu (original) (raw)
Uploads
Papers by Brett Hashimoto
Rethinking Vocabulary Size Test Design: Frequency Versus Item Difficulty Brett James Hashimoto De... more Rethinking Vocabulary Size Test Design: Frequency Versus Item Difficulty Brett James Hashimoto Department of Linguistics and English Language, BYU Master of Arts For decades, vocabulary size tests have been built upon the idea that if a test-taker knows enough words at a given level of frequency based on a list from corpus, they will also know other words of that approximate frequency as well as all words that are more frequent. However, many vocabulary size tests are based on corpora that are as out-of-date as 70 years old and that may be ill-suited for these tests. Based on these potentially problematic areas, the following research questions were asked. First, to what degree would a vocabulary size test based on a large, contemporary corpus be reliable and valid? Second, would it be more reliable and valid than previously designed vocabulary size tests? Third, do words across, 1,000-word frequency bands vary in their item difficulty? In order to answer these research questions, 4...
Language Assessment Quarterly, 2021
ABSTRACT Modern vocabulary size tests are generally based on the notion that the more frequent a ... more ABSTRACT Modern vocabulary size tests are generally based on the notion that the more frequent a word is in a language, the more likely a learner will know that word. However, this assumption has been seldom questioned in the literature concerning vocabulary size tests. Using the Vocabulary of American-English Size Test (VAST) based on the Corpus of Contemporary American English (COCA), 403 English language learners were tested on a 10% systematic random sample of the first 5,000 most frequent words from that corpus. Pearson correlation between Rasch item difficulty (the probability that test-takers will know a word) and frequency was only r = 0.50 (r2 = 0.25). This moderate correlation indicates that the frequency of a word can only predict which words are known with only a limited degree of and that other factors are also affecting the order of acquisition of vocabulary. Additionally, using vocabulary levels/bands of 1,000 words as part of the structure of vocabulary size tests is shown to be questionable as well. These findings call into question the construct validity of modern vocabulary size tests. However, future confirmatory research is necessary to comprehensively determine the degree to which frequency of words and vocabulary size of learners are related.
This study tests three measures of lexical diversity (LD), each using five operationalizations of... more This study tests three measures of lexical diversity (LD), each using five operationalizations of word types. The measures include MTLD (measure of textual lexical diversity), MTLD-W (moving average MTLD with wrap-around measurement), and MATTR (moving average type-token ratio). Each of these measures is tested with types operationalized as orthographic forms, lemmas using automated POS tags, lemmas using manually corrected POS tags, flemmas (list-based lemmas that do not distinguish between parts of speech), and word families. These measures are applied to 60 narrative texts written in English by adolescent native speakers of English (n = 13), Finnish (n = 31), and Swedish (n = 16). Each individual LD measure is evaluated in relation to how well it correlates with the mean LD ratings of 55 human raters whose inter-rater reliability was exceedingly high (Cronbach’s alpha = .980). The overall results show that the three measures are comparable but two of the operationalizations of ty...
Applied Pragmatics
Discourse Completion Tasks (DCTs) have been one of the most popular tools in pragmatics research.... more Discourse Completion Tasks (DCTs) have been one of the most popular tools in pragmatics research. Yet, many have criticized DCTs for their lack of authenticity (e.g., Culpeper, Mackey, & Taguchi, 2018; Nguyen, 2019). We propose that corpora can serve as resources in designing and evaluating DCTs. We created a DCT using advice-seeking prompts from the Q+A corpus (Baker & Egbert, 2016). Then, we administered the DCT to 33 participants. We evaluated the DCT by (1) comparing the linguistic form and the semantic content of the participants’ DCT responses (i.e., advice-giving expressions) with authentic data from the corpus; and (2) interviewing the participants about the instrument quality. Chi-square tests between DCT data and corpus data revealed no significant differences in advice-giving expressions in terms of both the overall level of directness (χ2 [2, N = 660] = 6.94, p = .03, V = .10) and linguistic realization (χ2 [8, N = 660] = 17.75, p = .02, V = .16), and showed a significan...
Rethinking Vocabulary Size Test Design: Frequency Versus Item Difficulty Brett James Hashimoto De... more Rethinking Vocabulary Size Test Design: Frequency Versus Item Difficulty Brett James Hashimoto Department of Linguistics and English Language, BYU Master of Arts For decades, vocabulary size tests have been built upon the idea that if a test-taker knows enough words at a given level of frequency based on a list from corpus, they will also know other words of that approximate frequency as well as all words that are more frequent. However, many vocabulary size tests are based on corpora that are as out-of-date as 70 years old and that may be ill-suited for these tests. Based on these potentially problematic areas, the following research questions were asked. First, to what degree would a vocabulary size test based on a large, contemporary corpus be reliable and valid? Second, would it be more reliable and valid than previously designed vocabulary size tests? Third, do words across, 1,000-word frequency bands vary in their item difficulty? In order to answer these research questions, 4...
Language Assessment Quarterly, 2021
ABSTRACT Modern vocabulary size tests are generally based on the notion that the more frequent a ... more ABSTRACT Modern vocabulary size tests are generally based on the notion that the more frequent a word is in a language, the more likely a learner will know that word. However, this assumption has been seldom questioned in the literature concerning vocabulary size tests. Using the Vocabulary of American-English Size Test (VAST) based on the Corpus of Contemporary American English (COCA), 403 English language learners were tested on a 10% systematic random sample of the first 5,000 most frequent words from that corpus. Pearson correlation between Rasch item difficulty (the probability that test-takers will know a word) and frequency was only r = 0.50 (r2 = 0.25). This moderate correlation indicates that the frequency of a word can only predict which words are known with only a limited degree of and that other factors are also affecting the order of acquisition of vocabulary. Additionally, using vocabulary levels/bands of 1,000 words as part of the structure of vocabulary size tests is shown to be questionable as well. These findings call into question the construct validity of modern vocabulary size tests. However, future confirmatory research is necessary to comprehensively determine the degree to which frequency of words and vocabulary size of learners are related.
This study tests three measures of lexical diversity (LD), each using five operationalizations of... more This study tests three measures of lexical diversity (LD), each using five operationalizations of word types. The measures include MTLD (measure of textual lexical diversity), MTLD-W (moving average MTLD with wrap-around measurement), and MATTR (moving average type-token ratio). Each of these measures is tested with types operationalized as orthographic forms, lemmas using automated POS tags, lemmas using manually corrected POS tags, flemmas (list-based lemmas that do not distinguish between parts of speech), and word families. These measures are applied to 60 narrative texts written in English by adolescent native speakers of English (n = 13), Finnish (n = 31), and Swedish (n = 16). Each individual LD measure is evaluated in relation to how well it correlates with the mean LD ratings of 55 human raters whose inter-rater reliability was exceedingly high (Cronbach’s alpha = .980). The overall results show that the three measures are comparable but two of the operationalizations of ty...
Applied Pragmatics
Discourse Completion Tasks (DCTs) have been one of the most popular tools in pragmatics research.... more Discourse Completion Tasks (DCTs) have been one of the most popular tools in pragmatics research. Yet, many have criticized DCTs for their lack of authenticity (e.g., Culpeper, Mackey, & Taguchi, 2018; Nguyen, 2019). We propose that corpora can serve as resources in designing and evaluating DCTs. We created a DCT using advice-seeking prompts from the Q+A corpus (Baker & Egbert, 2016). Then, we administered the DCT to 33 participants. We evaluated the DCT by (1) comparing the linguistic form and the semantic content of the participants’ DCT responses (i.e., advice-giving expressions) with authentic data from the corpus; and (2) interviewing the participants about the instrument quality. Chi-square tests between DCT data and corpus data revealed no significant differences in advice-giving expressions in terms of both the overall level of directness (χ2 [2, N = 660] = 6.94, p = .03, V = .10) and linguistic realization (χ2 [8, N = 660] = 17.75, p = .02, V = .16), and showed a significan...