Henrik Gyllstad | Lund University (original) (raw)
Uploads
Papers by Henrik Gyllstad
Gyllstad, H., McLean, S., & Stewart, J. (2020). Using confidence intervals to determine adequate item sample sizes for vocabulary tests: An essential but overlooked practice. Language Testing, 0265532220979562., 2020
The last three decades have seen an increase of tests aimed at measuring an individual's vocabula... more The last three decades have seen an increase of tests aimed at measuring an individual's vocabulary level or size. The target words used in these tests are typically sampled from word frequency lists, which are in turn based on language corpora. Conventionally, test developers sample items from frequency bands of 1000 words; different tests employ different sampling ratios. Some have as few as 5 or 10 items representing the underlying population of words, whereas other tests feature a larger number of items, such as 24, 30, or 40. However, very rarely are the sampling size choices supported by clear empirical evidence. Here, using a bootstrapping approach, we illustrate the effect that a sample-size increase has on confidence intervals of individual learner vocabulary knowledge estimates, and on the inferences that can safely be made from test scores. We draw on a unique dataset consisting of adult L1 Japanese test takers' performance on two English vocabulary test formats, each featuring 1000 words. Our analysis shows that there are few purposes and settings where as few as 5 to 10 sampled items from a 1000-word frequency band (1K) are sufficient. The use of 30 or more items per 1000-word frequency band and tests consisting of fewer bands is recommended.
Studies in Second Language Acquisition, 2016
Formulaic language represents a challenge to even the most proficient of language learners. Evide... more Formulaic language represents a challenge to even the most proficient of language learners. Evidence is mixed as to whether native and non-native speakers process it in a fundamentally different way, whether exposure can lead to more nativelike processing for non-natives, and how L1 knowledge is used to aid comprehension. In this study we investigate how advanced non-native speakers process idioms encountered in their L2. We use eye-tracking to see whether a highly proficient group of L1 Swedes show any evidence of formulaic processing for English idioms. We also compare translations of Swedish idioms and congruent idioms (items that exist in both languages) to see how L1 knowledge is utilised during online processing. Results support the view that L1 knowledge is automatically used from the earliest stages of processing, regardless of whether sequences are congruent, and that exposure and advanced proficiency can lead to nativelike formulaic processing in the L2.
The present study investigates whether two types of word combinations (free combinations, colloc... more The present study investigates whether two types of word combinations (free combinations, collocations) differ in terms of processing by testing Howarth’s Continuum Model (1996, 1998) based on word combination typologies from a phraseological tradition. A visual semantic judgement task was administered to advanced Swedish learners of English (n = 27) and native English-speaking controls (n = 38). Reaction times and error rates were recorded for free combinations, collocations, and baseline items. There was a processing cost for collocations compared to free combinations, for both groups of participants. This cost likely stems from the semantically semi-transparent nature of collocations as they are defined in the phraseological tradition. Furthermore, phrasal frequency based on corpus values also predicted reaction times. These results lend initial support to the Continuum Model from a processing perspective and suggest that degree of semantic transparency together with phrasal frequency plays an important role in collocational processing.
ITL International Journal of Applied Linguistics
In most tests of vocabulary size, knowledge is assessed through multiple-choice formats. Despite ... more In most tests of vocabulary size, knowledge is assessed through multiple-choice formats. Despite advantages such as ease of scoring, multiple-choice tests (MCT) are accompanied with problems.
The purpose of the study was to investigate the reliability and validity of the monolingual versi... more The purpose of the study was to investigate the reliability and validity of the monolingual version of the VST using a Classical Test Theory approach, based on data from a sizeable group of high intermediate and advanced Swedish learners of English. An additional purpose was to also use the test in a longitudinal design in order to investigate learners’ vocabulary size development over time.
A total of 198 participants took part in the study. They were all university-level, full-time students of English at a university in southern Sweden. The great majority (n = 151) had just started their first term of study; students in their second term of study (n = 22) also sat the test, as did students in their third term of study (n = 25). The test was a paper-and-pencil version of the 140-item monolingual (English) Vocabulary Size Test. The test version used can be found in Nation & Beglar (2007) and Schmitt (2010).
The VST seems capable of yielding reliable scores, but some items are in need of revision; a difficulty continuum based on target word frequency is visible but with some anomalies; VST scores correlate highly with VLT scores (another widely used vocabulary size test); Classical Test Theory approaches to validation suffer from sample dependence; over a period of 4.5 months, Swedish university students of English increased their mean vocabulary size by 648 word families; there is a need for further validation studies that investigate whether learners truly know the proportion of words from the target domain to the extent suggested by VST scores.
This article assesses the influence of L1 intralexical knowledge on the formation of L2 intralexi... more This article assesses the influence of L1 intralexical knowledge on the formation of L2 intralexical collocations. Two tests, a primed lexical decision task (LDT) and a test of receptive collocational knowledge, were administered to a group of non-native speakers (NNSs) (L1 Swedish), with native speakers (NSs) of English serving as controls on the LDT. The tests assessed collocations in three critical conditions: (i) collocations with translation equivalents in Swedish and English (L1–L2), (ii) collocations that were acceptable in English but not in Swedish (L2-only), and (iii) unrelated items for baseline data. Our results showed that the L1 may have considerable influence on the development of L2 collocational knowledge. NNSs both processed [with faster reaction times (RTs) on the LDT] and recognized (with higher receptive scores) L1–L2 collocations more effectively than L2-only collocations. However, the results of the LDT also showed considerable variability for the L2-only condition, suggesting that the overall slower RTs in this condition might have been linked more to a lack of priming for individual items rather than slower RTs for this condition as a whole.
This study investigated the influence of frequency effects on the processing of congruent (i.e., ... more This study investigated the influence of frequency effects on the processing of congruent (i.e., having an equivalent first language [L1] construction) collocations and
incongruent (i.e., not having an equivalent L1 construction) collocations in a second language (L2).
Eurosla Monograph Series, 2013
The heightened interest in L2 vocabulary over the last two or three decades has brought with it ... more The heightened interest in L2 vocabulary over the last two or three decades has
brought with it a number of suggestions of how vocabulary knowledge should be
modelled. From a testing and assessment perspective, this paper takes a closer
look at some of these suggestions and attempts to tease out how terms like model,
dimension and construct are used to describe different aspects of vocabulary
knowledge, and how the terms relate to each other. Next, the two widely
assumed dimensions of vocabulary breadth and depth are investigated in terms
of their viability for testing purposes. The paper identifies several challenges in
this regard, among others the questionable assumption that multi-word units
like collocations naturally belong in the depth dimension, and problems that
follow from the complex and often ill-defined nature of the depth dimension.
Suggestions for remedies are provided.
EUROSLA Yearbook 14, 2014
The present study investigates whether two types of word combinations (free combinations, colloc... more The present study investigates whether two types of word combinations (free combinations, collocations) differ in terms of processing by testing Howarth’s Continuum Model (1996, 1998) based on word combination typologies from a phraseological tradition. A visual semantic judgement task was administered to advanced Swedish learners of English (n = 27) and native English-speaking controls (n = 38). Reaction times and error rates were recorded for free combinations, collocations, and baseline items. There was a processing cost for collocations compared to free combinations, for both groups of participants. This cost likely stems from the semantically semi-transparent nature of collocations as they are defined in the phraseological tradition. Furthermore, phrasal frequency based on corpus values also predicted reaction times. These results lend initial support to the Continuum Model from a processing perspective and suggest that degree of semantic transparency together with phrasal frequency plays an important role in collocational processing.
The Encyclopedia of Applied Linguistics, Dec 3, 2014
The more widely held notion of grammatical collocation is that of a word combination in which at... more The more widely held notion of grammatical
collocation is that of a word combination in which at least one word is lexical (open word
class) and at least one word, or structure (see below), is grammatical (closed word class),
as opposed to lexical collocation where the co-occurrence of lexical items results from some
sort of convention rather than syntactic dependencies.
Books by Henrik Gyllstad
International papers with Impact factor by Henrik Gyllstad
Language Testing , 2020
The last three decades have seen an increase of tests aimed at measuring an individual’s vocabula... more The last three decades have seen an increase of tests aimed at measuring an individual’s vocabulary level or size. The target words used in these tests are typically sampled from word frequency lists, which are in turn based on language corpora. Conventionally, test developers sample items from frequency bands of 1000 words; different tests employ different sampling ratios. Some have as few as 5 or 10 items representing the underlying population of words, whereas other tests feature a larger number of items, such as 24, 30, or 40. However, very rarely are the sampling size choices supported by clear empirical evidence. Here, using a bootstrapping approach, we illustrate the effect that a sample-size increase has on confidence intervals of individual learner vocabulary knowledge estimates, and on the inferences that can safely be made from test scores. We draw on a unique dataset consisting of adult L1 Japanese test takers’ performance on two English vocabulary test formats, each featuring 1000 words. Our analysis shows that there are few purposes and settings where as few as 5 to 10 sampled items from a 1000-word frequency band (1K) are sufficient. The use of 30 or more items per 1000-word frequency band and tests consisting of fewer bands is recommended.
Gyllstad, H., McLean, S., & Stewart, J. (2020). Using confidence intervals to determine adequate item sample sizes for vocabulary tests: An essential but overlooked practice. Language Testing, 0265532220979562., 2020
The last three decades have seen an increase of tests aimed at measuring an individual's vocabula... more The last three decades have seen an increase of tests aimed at measuring an individual's vocabulary level or size. The target words used in these tests are typically sampled from word frequency lists, which are in turn based on language corpora. Conventionally, test developers sample items from frequency bands of 1000 words; different tests employ different sampling ratios. Some have as few as 5 or 10 items representing the underlying population of words, whereas other tests feature a larger number of items, such as 24, 30, or 40. However, very rarely are the sampling size choices supported by clear empirical evidence. Here, using a bootstrapping approach, we illustrate the effect that a sample-size increase has on confidence intervals of individual learner vocabulary knowledge estimates, and on the inferences that can safely be made from test scores. We draw on a unique dataset consisting of adult L1 Japanese test takers' performance on two English vocabulary test formats, each featuring 1000 words. Our analysis shows that there are few purposes and settings where as few as 5 to 10 sampled items from a 1000-word frequency band (1K) are sufficient. The use of 30 or more items per 1000-word frequency band and tests consisting of fewer bands is recommended.
Studies in Second Language Acquisition, 2016
Formulaic language represents a challenge to even the most proficient of language learners. Evide... more Formulaic language represents a challenge to even the most proficient of language learners. Evidence is mixed as to whether native and non-native speakers process it in a fundamentally different way, whether exposure can lead to more nativelike processing for non-natives, and how L1 knowledge is used to aid comprehension. In this study we investigate how advanced non-native speakers process idioms encountered in their L2. We use eye-tracking to see whether a highly proficient group of L1 Swedes show any evidence of formulaic processing for English idioms. We also compare translations of Swedish idioms and congruent idioms (items that exist in both languages) to see how L1 knowledge is utilised during online processing. Results support the view that L1 knowledge is automatically used from the earliest stages of processing, regardless of whether sequences are congruent, and that exposure and advanced proficiency can lead to nativelike formulaic processing in the L2.
The present study investigates whether two types of word combinations (free combinations, colloc... more The present study investigates whether two types of word combinations (free combinations, collocations) differ in terms of processing by testing Howarth’s Continuum Model (1996, 1998) based on word combination typologies from a phraseological tradition. A visual semantic judgement task was administered to advanced Swedish learners of English (n = 27) and native English-speaking controls (n = 38). Reaction times and error rates were recorded for free combinations, collocations, and baseline items. There was a processing cost for collocations compared to free combinations, for both groups of participants. This cost likely stems from the semantically semi-transparent nature of collocations as they are defined in the phraseological tradition. Furthermore, phrasal frequency based on corpus values also predicted reaction times. These results lend initial support to the Continuum Model from a processing perspective and suggest that degree of semantic transparency together with phrasal frequency plays an important role in collocational processing.
ITL International Journal of Applied Linguistics
In most tests of vocabulary size, knowledge is assessed through multiple-choice formats. Despite ... more In most tests of vocabulary size, knowledge is assessed through multiple-choice formats. Despite advantages such as ease of scoring, multiple-choice tests (MCT) are accompanied with problems.
The purpose of the study was to investigate the reliability and validity of the monolingual versi... more The purpose of the study was to investigate the reliability and validity of the monolingual version of the VST using a Classical Test Theory approach, based on data from a sizeable group of high intermediate and advanced Swedish learners of English. An additional purpose was to also use the test in a longitudinal design in order to investigate learners’ vocabulary size development over time.
A total of 198 participants took part in the study. They were all university-level, full-time students of English at a university in southern Sweden. The great majority (n = 151) had just started their first term of study; students in their second term of study (n = 22) also sat the test, as did students in their third term of study (n = 25). The test was a paper-and-pencil version of the 140-item monolingual (English) Vocabulary Size Test. The test version used can be found in Nation & Beglar (2007) and Schmitt (2010).
The VST seems capable of yielding reliable scores, but some items are in need of revision; a difficulty continuum based on target word frequency is visible but with some anomalies; VST scores correlate highly with VLT scores (another widely used vocabulary size test); Classical Test Theory approaches to validation suffer from sample dependence; over a period of 4.5 months, Swedish university students of English increased their mean vocabulary size by 648 word families; there is a need for further validation studies that investigate whether learners truly know the proportion of words from the target domain to the extent suggested by VST scores.
This article assesses the influence of L1 intralexical knowledge on the formation of L2 intralexi... more This article assesses the influence of L1 intralexical knowledge on the formation of L2 intralexical collocations. Two tests, a primed lexical decision task (LDT) and a test of receptive collocational knowledge, were administered to a group of non-native speakers (NNSs) (L1 Swedish), with native speakers (NSs) of English serving as controls on the LDT. The tests assessed collocations in three critical conditions: (i) collocations with translation equivalents in Swedish and English (L1–L2), (ii) collocations that were acceptable in English but not in Swedish (L2-only), and (iii) unrelated items for baseline data. Our results showed that the L1 may have considerable influence on the development of L2 collocational knowledge. NNSs both processed [with faster reaction times (RTs) on the LDT] and recognized (with higher receptive scores) L1–L2 collocations more effectively than L2-only collocations. However, the results of the LDT also showed considerable variability for the L2-only condition, suggesting that the overall slower RTs in this condition might have been linked more to a lack of priming for individual items rather than slower RTs for this condition as a whole.
This study investigated the influence of frequency effects on the processing of congruent (i.e., ... more This study investigated the influence of frequency effects on the processing of congruent (i.e., having an equivalent first language [L1] construction) collocations and
incongruent (i.e., not having an equivalent L1 construction) collocations in a second language (L2).
Eurosla Monograph Series, 2013
The heightened interest in L2 vocabulary over the last two or three decades has brought with it ... more The heightened interest in L2 vocabulary over the last two or three decades has
brought with it a number of suggestions of how vocabulary knowledge should be
modelled. From a testing and assessment perspective, this paper takes a closer
look at some of these suggestions and attempts to tease out how terms like model,
dimension and construct are used to describe different aspects of vocabulary
knowledge, and how the terms relate to each other. Next, the two widely
assumed dimensions of vocabulary breadth and depth are investigated in terms
of their viability for testing purposes. The paper identifies several challenges in
this regard, among others the questionable assumption that multi-word units
like collocations naturally belong in the depth dimension, and problems that
follow from the complex and often ill-defined nature of the depth dimension.
Suggestions for remedies are provided.
EUROSLA Yearbook 14, 2014
The present study investigates whether two types of word combinations (free combinations, colloc... more The present study investigates whether two types of word combinations (free combinations, collocations) differ in terms of processing by testing Howarth’s Continuum Model (1996, 1998) based on word combination typologies from a phraseological tradition. A visual semantic judgement task was administered to advanced Swedish learners of English (n = 27) and native English-speaking controls (n = 38). Reaction times and error rates were recorded for free combinations, collocations, and baseline items. There was a processing cost for collocations compared to free combinations, for both groups of participants. This cost likely stems from the semantically semi-transparent nature of collocations as they are defined in the phraseological tradition. Furthermore, phrasal frequency based on corpus values also predicted reaction times. These results lend initial support to the Continuum Model from a processing perspective and suggest that degree of semantic transparency together with phrasal frequency plays an important role in collocational processing.
The Encyclopedia of Applied Linguistics, Dec 3, 2014
The more widely held notion of grammatical collocation is that of a word combination in which at... more The more widely held notion of grammatical
collocation is that of a word combination in which at least one word is lexical (open word
class) and at least one word, or structure (see below), is grammatical (closed word class),
as opposed to lexical collocation where the co-occurrence of lexical items results from some
sort of convention rather than syntactic dependencies.
Language Testing , 2020
The last three decades have seen an increase of tests aimed at measuring an individual’s vocabula... more The last three decades have seen an increase of tests aimed at measuring an individual’s vocabulary level or size. The target words used in these tests are typically sampled from word frequency lists, which are in turn based on language corpora. Conventionally, test developers sample items from frequency bands of 1000 words; different tests employ different sampling ratios. Some have as few as 5 or 10 items representing the underlying population of words, whereas other tests feature a larger number of items, such as 24, 30, or 40. However, very rarely are the sampling size choices supported by clear empirical evidence. Here, using a bootstrapping approach, we illustrate the effect that a sample-size increase has on confidence intervals of individual learner vocabulary knowledge estimates, and on the inferences that can safely be made from test scores. We draw on a unique dataset consisting of adult L1 Japanese test takers’ performance on two English vocabulary test formats, each featuring 1000 words. Our analysis shows that there are few purposes and settings where as few as 5 to 10 sampled items from a 1000-word frequency band (1K) are sufficient. The use of 30 or more items per 1000-word frequency band and tests consisting of fewer bands is recommended.