veronica benigno - Academia.edu (original) (raw)

Papers by veronica benigno

Dans cette etude les « collocations fondamentales » sont envisagees comme des unites polylexicale... more Dans cette etude les « collocations fondamentales » sont envisagees comme des unites polylexicales significatives (unies par des liens collocationnels) frequentes (dans l’usage) ou non frequentes (lorsqu’elles sont pertinentes pour la communication) qui representent pour les locuteurs natifs les contextes les plus essentiels d’un mot donne. Nous avons extrait du corpus frWaC (Baroni et al. 2010) environ 20 000 associations a partir de dix mots pivots issus du Dictionnaire Fondamental de Gougenheim (1971) ; puis, au moyen de la frequence et de mesures d’association, nous avons selectionne un echantillon d’environ 400 associations candidates au statut de collocations fondamentales et demande a 90 locuteurs natifs du francais de selectionner celles qui leur apparaissaient essentielles pour la communication. L’objectif etait : a) de developper une methode pour le reperage automatique des collocations fondamentales ; et b) d’evaluer, a l’aide de l’intuition de locuteurs natifs, la validi...

L'objet d'etude de la presente recherche est la notion de « collocation fondamentale ». N... more L'objet d'etude de la presente recherche est la notion de « collocation fondamentale ». Nous calquons ce terme sur celui de « vocabulaire fondamental », issu de la denomination Francais fondamental utilisee par Gougenheim et al. (1964) pour designer le vocabulaire de base de la langue francaise. Dans la these, les collocations fondamentales sont envisagees comme des unites polylexicales significatives (unies par des liens collocationnels) frequentes (dans l'usage) ou non frequentes (lorsqu'elles sont pertinentes pour la communication) qui representent pour les locuteurs natifs les contextes de cooccurrence les plus essentiels et les plus typiques d'un mot pivot donne. Les etudes descriptives ou acquisitionnelles sur les collocations sont assez recentes et, a notre connaissance, ne se sont pas encore interessees a cette notion, qui devrait pourtant constituer une etape preliminaire cruciale dans toute approche didactique destinee aux apprenants de langue etrangere...

Following the pioneering work of Gougenheim and his team in the 1950s, pedagogical frequency list... more Following the pioneering work of Gougenheim and his team in the 1950s, pedagogical frequency lists have received much attention in France and elsewhere. However, research has mainly focused on single lexical items, whereas the role played by high-frequency phraseological units, i.e. units functioning as independent lexico-grammatical chunks, has been neglected. In this paper we describe the relationship between frequency and native speakers’ judgements in order to determine the basic character of phraseological units; additionally, we show that individual judgements seem to be affected by the degree of fixedness between the components of such units. In the last section we discuss some pedagogical implications derived from a corpus-based study in the domain of ‘social events’.

Revue française de linguistique appliquée, 2015

International Review of Applied Linguistics in Language Teaching, 2016

This paper discusses the results of an experiment designed to analyze lexical richness, operation... more This paper discusses the results of an experiment designed to analyze lexical richness, operationalized in terms of lexical frequency, in the written production of a group of L2 learners and native speakers of Italian. The data are derived from the CALC study (Communicative Adequacy and Linguistic Complexity in L2 writing; cfr. Kuiken / Vedder / Gilabert 2010; Vedder 2012), set up to investigate the relationship between communicative adequacy and linguistic complexity in L2 and L1 writing (Italian, Dutch, Spanish), in relation to the Common European Framework of References (CEFR; Council of Europe, 2001). The present study focuses on the data of the L2 learners and native speakers of Italian. The first research question discussed in the study concerns the relationship between lexical richness and the general proficiency level of L2 learners. The second question addresses lexical richness in the written texts of Dutch low-intermediate and intermediate learners of Italian, compared to...

This document reports on an ongoing project to develop the Global Scale of English (GSE) Vocabula... more This document reports on an ongoing project to develop the Global Scale of English (GSE) Vocabulary for Young Learners, a graded lexical framework for EFL learners aged 6 to 11. The framework is aligned to the GSE and the Common European Framework of Reference for Languages (CEFR; Council of Europe, 2001). This project follows up on a previous study carried out to identify vocabulary targets of adult learners of English (Benigno & De Jong, 2017, available at https://prodengcom.s3.amazonaws.com/GSE-Vocab.pdf and, together with the GSE Learning Objectives for Young Learners, aims to help teachers identify pupils’ learning targets at increasing levels of proficiency.

An overview of studies on vocabulary acquisition, teaching and assessment (Bogaards and Laufer, 2... more An overview of studies on vocabulary acquisition, teaching and assessment (Bogaards and Laufer, 2004; Meara, 2009; Nation, 2001; Read, 2000; Schmitt, 2000) shows that this field of investigation has evolved quite rapidly over the last few decades, yet there is still little agreement on which and how many words are needed to communicate efficiently at increasing proficiency levels. Two crucial questions with regard to assessment of vocabulary remain partially unanswered: how to assess vocabulary holistically and how to relate vocabulary knowledge to proficiency.
Research has shown that vocabulary knowledge is related to both the size of vocabulary and its depth. Assessment of vocabulary mostly relies on quantitative measures (such as frequency and diversity) whereas scarce attention has been paid to functional aspects such as communicative usefulness. Moreover, some concerns have been raised about using lemmas or word-families as the unit of counting in measuring individuals’ vocabulary size, as well as about focusing on single words, with no attention paid to lexico-grammar (Purpura, 1997; Wray, 2002) and pragmatics. Qualitative aspects of vocabulary knowledge are generally assessed using human rater judgements but no framework seems to have been put into place for a more systematic investigation of features such as vocabulary range or appropriateness to the context. Attempts to relate vocabulary knowledge to proficiency levels have so far been irresolute and mostly related on learner data, therefore showing what learners do know instead of what learners should know to communicate efficiently in an L1 context.
In this paper we report on a corpus-based study to develop a CEFR-aligned graded lexical inventory of words and phrases in general English, with the purpose of providing the lexical exponents for English and complementing the functional guidance found in the CEFR descriptors. We followed four methodological steps: corpus analysis, semantic annotation, human ratings, and vocabulary scaling. The first step required corpus analysis and computational processing of the data. We compiled an L1 corpus of British and American English of about 2.5 billion words to inventory real-life usage of English in written and spoken text. In a second phase, words and phrases extracted from the corpus using measures such as frequency and dispersion were semantically annotated using the Council of Europe Vantage Specifications’ distinction in topics, general notions, and functions. Thereby the vocabulary was identified needed to express concrete and abstract concepts and to fulfil specific communicative purposes, e.g., to apologize, to make a request. A third step was to ask 15 teachers to rate vocabulary on a pre-defined 5-point scale built on the principle of communicative usefulness. In the final step frequency data and teacher judgements were combined in a weighted algorithm to grade vocabulary on the CEFR, taking into account research evidence on vocabulary size requirements (e.g., Hazenberg & Hulstijn, 1996).
Our study proposes an integrated model of vocabulary assessment which combines quantitative observation of L1 usage of vocabulary in English with qualitative information about usefulness of vocabulary to talk about a range of different topics and carry out different communicative purposes.

Les listes de fréquence (listes de base, élémentaires, etc.) ont connu un certain succès pédagogi... more Les listes de fréquence (listes de base, élémentaires, etc.) ont connu un certain succès pédagogique, notamment en France à la suite des travaux pionniers de Gougenheim. Cependant, les recherches étaient restées centrées sur l’unité-mot, sans prendre en compte les associations lexicales les plus fréquentes de la langue, associations qui fonctionnent comme de véritables unités lexico-grammaticales. Notre contribution vise à préciser la relation entre fréquence et jugement pour l’identification du caractère “fondamental” des unités polylexicales et avance l’hypothèse que le figement de ces unités joue un rôle dans le choix opéré par les locuteurs natifs. Nous proposons ensuite quelques pistes didactiques à partir d’une étude de corpus réalisée dans le domaine des “événements sociaux”.

Following the pioneering work of Gougenheim and his team in the 1950s, pedagogical frequency lists have received much attention in France and elsewhere. However, research has mainly focused on single lexical items, whereas the role played by high-frequency phraseological units, i.e. units functioning as independent lexico-grammatical chunks, has been neglected. In this paper we describe the relationship between frequency and native speakers’ judgements in order to determine the basic character of phraseological units; additionally, we show that individual judgements seem to be affected by the degree of fixedness between the components of such units. In the last section we discuss some pedagogical implications derived from a corpus-based study in the domain of ‘social events’.

This paper presents the concepts of “core vocabulary” and “core collocations” and discusses impli... more This paper presents the concepts of “core vocabulary” and “core collocations” and discusses implications for the treatment of collocations in monolingual learner phraseological dictionaries. In the first section, we give an account of what the above concepts refer to by drawing on previous research. In the second part, we present the findings from a study (Benigno et al., 2015; Benigno et al., forthcoming) using L1 speaker judgements to validate a method to automatically extract core collocations from frWaC (Baroni et al., 2010), a very large web-corpus. The study aims to identify what features can be used to define and filter “core collocations” from a set of potential candidates – which were retrieved from the corpus by means of frequency, dispersion, and associative measures and then subjected to the evaluation of a group of native speakers who were asked to decide about the importance of collocations to communicate in everyday situations. Findings from the study showed that frequency is an appropriate but not sufficient measure to identify such central and nuclear units in language. In fact native speakers seem to attach importance (intended as usefulness in language use) to highly restricted and fixed units regardless of their frequency of occurrence - providing evidence of the fact that what is core is not systematically a matter of frequency. Based on these findings, the third part of the paper deals with phraseology from the lexicographical perspective and argues that in learner dictionaries both frequency and usefulness should serve as main organizing principles. Our discussion will be accompanied by practical examples extracted from the “Longman Collocations Dictionary and Thesaurus”, a learner dictionary informed by corpus data as well as by pedagogical judgements of expert lexicographers.

Words are never used in isolation but in combination and not with any word but only with certain ... more Words are never used in isolation but in combination and not with any word but only with certain specific words. To use a language properly the appropriate combinations must be used. In Italian a piece of bread is a tozzo di pane, but is that the case for meat? Is a tozzo di carne an appropriate combination? If you want to make an appointment with somebody you should not say (as in English) fare un appuntamento but fissare un appuntamento. An Italian affronta una discussione (enters or tackles a discussion), but is it possible for him to say affrontare un’obiezione (to enter or tackle an objection)? Yes it is, as this dictionary shows. So every language has its own preferences in word combinations, misleading a non-native learner into making mistakes influenced by his own language.

This dictionary reconstructs the frame to which 3,000 Italian entries belong and aims to help non-Italian speakers with an advanced linguistic competence to find the appropriate word combinations for communicating in Italian. Moreover, this dictionary can also be useful for native speakers who want to improve their lexical choices in writing and speaking Italian. The dictionary, contrary to ordinary monolingual and bilingual dictionaries, systematically lists word combinations (almost 90,000), explaining and/or exemplifying them.

Also available: Dizionario Combinatorio Italiano , a more extensive, 2-volume hardcover edition with 6,500 entries listing over 200,000 word combinations.

Use of vocabulary by language learners has been extensively investigated within the broader area ... more Use of vocabulary by language learners has been extensively investigated within the broader area of lexical acquisition, second language teaching, and language assessment. In recent decades there has been a shift from a view of vocabulary knowledge as the ability to produce single words to a view of vocabulary knowledge as the ability to use words appropriately in combination with other words and according to their pragmatic value. The awareness of the crucial role played by formulaic language in contributing to the fluency of language learners (Nattinger & DeCarrico, 1992; Ellis, 2002; Wray, 2002), supported by findings in psycholinguistics and neurolinguistics (Jackendoff, 2002), has led to the proliferation of studies investigating both the theoretical nature of phraseological units and the impact of their use on language performance. The fortunate simultaneous development of corpus linguistics has created very fertile ground for studies on different aspects of phraseology. Due to their idiosyncratic nature, collocations in particular have captured the attention of many scholars. Many have pointed out that vocabulary knowledge is not only related to the size of vocabulary known by speakers but also to its depth. A range of measures are commonly employed to measure lexical richness, an umbrella term including lexical variation, sophistication, density and accuracy (Read, 2000). For example lexical diversity, i.e. the number of different types used in a text, is traditionally measured by the type-token ratio (TTR), and more recently, by more sophisticated indexes (e.g. the HD-D measure, McCarthy and Jarvis, 2007) aimed at minimizing the effect of text length on the score. However, the drawback of these measures is their strong quantitative component, while the qualitative rating of test takers’ use of vocabulary is generally left to the judgement of expert raters (deciding, for example, whether test takers choose the right lexical item or whether their lexical choice is appropriate to the context). This paper investigates the use of academic collocations in 50,000 test takers’ essays in order to establish whether there is a relationship between their use of collocations and their proficiency. Our aim is to identify both quantitative and qualitative features discriminating between proficiency levels (B1 to C2), using the ACL (Academic Collocations List) developed by Ackermann and Chen (2013) as a reference list. This list includes 2,500 collocations extracted using PICAE, the Pearson International Corpus of Academic English. The list was compiled using a mixed approach which combines statistical analysis with experts’ judgement to identify a set of collocations for pedagogical use. In conclusion, our study represents an attempt to bring a more qualitative insight into assessment of vocabulary. We argue vocabulary is a complex sub-construct involving multiple dimensions, which therefore needs an integrated assessment model using multiple measures, especially qualitative measures of vocabulary use. This investigation of the use of academic collocations in a test of academic English seeks to contribute to these measures. Keywords: academic English, assessment, collocation, corpus, essay, level, lexis, proficiency, test, vocabulary.

IRAL - International Review of Applied Linguistics in Language Teaching

In this article we report on an experiment set up to investigate lexical richness and collocation... more In this article we report on an experiment set up to investigate lexical richness and collocational competence in the written production of 39 low-intermediate and intermediate learners of Italian L2. Lexical richness was assessed by means of a lexical profiling method inspired by Laufer and Nation (1995) and developed by Bardel, Lindqvist and Gudmundson (Bardel and Lindqvist 2011; Bardel et al. 2012; Lindqvist et al. 2011; Lindqvist et al. 2013). The lexical profiler was used to compare the lexical richness of the L2 texts with that of 18 native speakers of Italian. The study focuses on the relationship between lexical richness, operationalized as lexical frequency, and the overall proficiency level in Italian of the L2 learners, measured by a C-test. In order to get a deeper insight into the development of lexical skills in L2, next to the lexical profiling method, an additional analysis of the use of collocations in L2 and L1 was carried out. The results show that although a relationship in L2 between lexical richness, collocational competence and general language proficiency could not be demonstrated, there appeared to be a number of traits which differentiate L2 and L1 writers.

Revue française de linguistique appliquée, 2015

International Review of Applied Linguistics in Language Teaching, 2016

Also available: Dizionario Combinatorio Italiano , a more extensive, 2-volume hardcover edition with 6,500 entries listing over 200,000 word combinations.

IRAL - International Review of Applied Linguistics in Language Teaching

Title: A meaning-based, CEFR-linked framework for assessing vocabulary knowledge Keywords: voca... more Title: A meaning-based, CEFR-linked framework for assessing vocabulary knowledge

Keywords: vocabulary selection, communicative efficiency, topical knowledge, real-life requirements, language functions

An overview of studies on vocabulary acquisition, teaching and assessment (Bogaards and Laufer, 2004; Meara, 2009; Nation, 2001; Read, 2000; Schmitt, 2000) shows that this field of investigation has evolved quite rapidly over the last few decades, yet there is still little agreement on which and how many words are needed to communicate efficiently at increasing proficiency levels. Two crucial questions with regard to assessment of vocabulary remain partially unanswered: how to assess vocabulary holistically and how to relate vocabulary knowledge to proficiency.
Research has shown that vocabulary knowledge is related to both the size of vocabulary and its depth. Assessment of vocabulary mostly relies on quantitative measures (such as frequency and diversity) whereas scarce attention has been paid to functional aspects such as communicative usefulness. Moreover, some concerns have been raised about using lemmas or word-families as the unit of counting in measuring individuals’ vocabulary size, as well as about focusing on single words, with no attention paid to lexico-grammar (Purpura, 1997; Wray, 2002) and pragmatics. Qualitative aspects of vocabulary knowledge are generally assessed using human rater judgements but no framework seems to have been put into place for a more systematic investigation of features such as vocabulary range or appropriateness to the context. Attempts to relate vocabulary knowledge to proficiency levels have so far been irresolute and mostly related on learner data, therefore showing what learners do know instead of what learners should know to communicate efficiently in an L1 context.
In this paper we report on a corpus-based study to develop a CEFR-aligned graded lexical inventory of words and phrases in general English, with the purpose of providing the lexical exponents for English and complementing the functional guidance found in the CEFR descriptors. We followed four methodological steps: corpus analysis, semantic annotation, human ratings, and vocabulary scaling. The first step required corpus analysis and computational processing of the data. We compiled an L1 corpus of British and American English of about 2.5 billion words to inventory real-life usage of English in written and spoken text. In a second phase, words and phrases extracted from the corpus using measures such as frequency and dispersion were semantically annotated using the Council of Europe Vantage Specifications’ distinction in topics, general notions, and functions. Thereby the vocabulary was identified needed to express concrete and abstract concepts and to fulfil specific communicative purposes, e.g., to apologize, to make a request. A third step was to ask 15 teachers to rate vocabulary on a pre-defined 5-point scale built on the principle of communicative usefulness. In the final step frequency data and teacher judgements were combined in a weighted algorithm to grade vocabulary on the CEFR, taking into account research evidence on vocabulary size requirements (e.g., Hazenberg & Hulstijn, 1996).
Our study proposes an integrated model of vocabulary assessment which combines quantitative observation of L1 usage of vocabulary in English with qualitative information about usefulness of vocabulary to talk about a range of different topics and carry out different communicative purposes.

Although research on vocabulary acquisition has evolved substantially over the last decades, ther... more Although research on vocabulary acquisition has evolved substantially over the last decades, there is still little agreement on which and how many words are needed to communicate efficiently at increasing proficiency levels. Until the advent of more communication-oriented approaches to language teaching, selection of vocabulary in learner books was partially arbitrary and learning activities lacked a focus on real-life tasks and usage-based target language. Nowadays, general vocabulary word lists for pedagogical use are available that make use of frequency as their primary selection criterion. Our study combines statistical analysis and pedagogical considerations to inform the creation of a graded vocabulary resource for English learning and teaching. A collection of written and spoken L1 data was compiled by combining three corpora: LCN (330 million words), UKWAC (approximately 2 billion words), and COCA (90 million words). The top 10,000 lemmas were semantically annotated, i.e. categorized by topics and subtopics. A total of 15 teachers were then asked to rate words’ communicative usefulness on a scale from 1 to 5. Frequency data and pedagogical ratings provided by teachers were combined in a weighted model to produce a graded vocabulary. This study offers a new validity framework for grading vocabulary since it complements the guidance of the CEFR’s functional approach by providing users with a rationale for selecting the lexical exponents needed to perform the communicative acts described by the Can-do statements.

Available descriptors in the CEFR are limited in number and unevenly distributed over the levels.... more Available descriptors in the CEFR are limited in number and unevenly distributed over the levels. Also the width of the CEFR levels is unpractical in many educational contexts. This paper presents a longitudinal research project to complement the CEFR. In a first experiment 89 new descriptors were pooled with 19 original CEFR descriptors with known logit values from North (2000) as anchors. In an online survey the descriptors were rated on the CEFR levels by 316 teachers from 91 countries claiming to have detailed knowledge of the CEFR. A second rating was obtained from 89 professional courseware developers and editors from 50 countries who provided ratings on a numerical scale ranging from 10 to 90. Within each group raters with significant deviance from all other raters and descriptors with large errors were removed. Teacher ratings were located within the CEFR levels based on the probability of their distance from any two adjacent level cut-offs. The ratings obtained from the teachers and those obtained from courseware developers and editors correlated at 0.961, indicating that the two sets had 92% of common variance. By removing misfitting descriptors this correlation increased to 0.981. In addition the anchors correlated 0.93 with their original IRT-based estimates, thereby corroborating their validity outside of the context in which they were first calibrated. This study represents an original contribution and a novel approach to the research on CEFR linking procedures and presents the opportunity to create more granular measurement of language proficiency than offered in the original CEFR.

ABSTRACT: Models for learning, teaching and assessing general language proficiency require a fram... more ABSTRACT: Models for learning, teaching and assessing general language proficiency require a framework for development to answer the question “What is the most efficient route from low to high proficiency?” Developing general language proficiency will require the acquisition of linguistic resources. Chapter 6 of the Common European Framework for Languages, discusses the acquisition of linguistic resources, but offers little more than food for thought in the form of Socratic questioning. With respect to vocabulary development the CEFR mentions options for lexical selection concluding with its famous: Users of the Framework may wish to consider and where appropriate state: according to which principle(s) lexical selection has been made (p. 151). This paper presents a framework to help coursebook and test developers to identify what vocabulary (single words, multiword lexical units, and phrases) learners need to produce to enable them to participate in general conversation in English at growing levels of proficiency. The inventory includes about 25,000 words and is searchable by a number of criteria through an online interface. The methodology is based on combining frequency data and usefulness judgements. The frequency data were based on three L1 corpora of written and spoken British and American English (more than 2,5 billion words). Lexical items, annotated by meaning, were presented to a pool of teachers who evaluated their usefulness. Quantitative and qualitative data were modelled to produce a graded lexical inventory aligned to the CEFR with the aim to provide a reference guideline to inform vocabulary selection for learning, teaching, and assessment purposes.

The CEFR (Council of Europe, 2001) has been widely used in Europe and beyond for the purpose of s... more The CEFR (Council of Europe, 2001) has been widely used in Europe and beyond for the
purpose of standard-setting in high-stakes examinations and curriculum development.
This paper reports on a large-scale project to extend the framework in the educational
domain, with a focus on academic study at the tertiary/post-secondary level context.
Using the same rigorous procedures applied by North (2000) to develop the original
framework, we created 337 descriptors describing what users of Academic English can
do with the language at increasing levels of proficiency. First, the original CEFR Can Do
statements were analyzed and a number of limitations identified: the original descriptors
are limited in number, unevenly distributed over the levels, and strongly biased towards
the speaking skill. In the next stage, new learning objectives were identified with
reference to learning, teaching and assessment materials of academic English based on
educational resources and guidelines. In the final stage, the descriptors were
benchmarked to the CEFR levels by a group of over 6,000 teachers worldwide and to the
GSE scale by ELT experts worldwide in a rating exercise. The ratings were then scaled
through IRT analysis. Linking to the CEFR was accomplished through inclusion of anchor
descriptors from North (2000).
In creating domain-specific descriptors, we address the particular language needs that
arise in the higher educational domain, helping to accurately define the construct of
academic English and offering an insight into how the CEFR can be extended to a
context other than the one it was originally developed for.