The Development of CBM Vocabulary Measures: Grade 4. Technical Report #1211 (original) (raw)
Related papers
The Development of CBM Vocabulary Measures: Grade 6. Technical Report # 1213
Behavioral Research and Teaching, 2012
In this technical report, we describe the development and piloting of a series of vocabulary assessments intended for use with students in grades two through eight. These measures, available as part of easyCBM TM , an online progress monitoring and benchmark/screening assessment system, were developed in 2010 and administered to approximately 1200 students per grade from schools across the United States in the spring of 2011 using a common item design to allow all items to be estimated on the same scale within each grade level. We analyzed the results of the piloting using a one parameter logistic (1PL) Rasch analysis. Because the results of these analyses are quite lengthy, we present the results for each grade's analysis in its own technical report, all sharing a common abstract and introduction but unique methods, results, and discussion sections.
The Development of CBM Vocabulary Measures: Grade 5. Technical Report #1212
Behavioral Research and Teaching, 2012
In this technical report, we describe the development and piloting of a series of vocabulary assessments intended for use with students in grades two through eight. These measures, available as part of easyCBM TM , an online progress monitoring and benchmark/screening assessment system, were developed in 2010 and administered to approximately 1200 students per grade from schools across the United States in the spring of 2011 using a common item design to allow all items to be estimated on the same scale within each grade level. We analyzed the results of the piloting using a one parameter logistic (1PL) Rasch analysis. Because the results of these analyses are quite lengthy, we present the results for each grade's analysis in its own technical report, all sharing a common abstract and introduction but unique methods, results, and discussion sections.
The Development of CBM Vocabulary Measures: Grade 3. Technical Report #1210
Behavioral Research and Teaching, 2012
In this technical report, we describe the development and piloting of a series of vocabulary assessments intended for use with students in grades two through eight. These measures, available as part of easyCBM TM , an online progress monitoring and benchmark/screening assessment system, were developed in 2010 and administered to approximately 1200 students per grade from schools across the United States in the spring of 2011 using a common item design to allow all items to be estimated on the same scale within each grade level. We analyzed the results of the piloting using a one parameter logistic (1PL) Rasch analysis. Because the results of these analyses are quite lengthy, we present the results for each grade's analysis in its own technical report, all sharing a common abstract and introduction but unique methods, results, and discussion sections. VOCABULARY CBM p. 1 The Development of CBM Vocabulary Measures: Grade 3 (Technical Report 1210) CBM assessments are a key component of many school improvement efforts, including the Response to Intervention (RTI) approach to meeting students' academic needs. In an RTI approach, teachers first administer a screening or benchmarking assessment to identify students who need supplemental interventions to meet grade-level expectations, then use a series of progress monitoring measures to evaluate the effectiveness of the interventions they provide the students. When students fail to show expected levels of progress (as indicated by 'flat line scores' or little improvement on repeated measures over time), teachers use this information to help them make instructional modifications with the goal of finding an intervention or combination of instructional approaches that will enable each student to make adequate progress toward achieving grade level proficiency and mastering content standards (McMaster, Fuchs, D., Fuchs, L.S., & Compton). In such a system, it is critical to have reliable measures that assess the target construct and are sensitive enough to detect improvement in skill over short periods of time. Because the term is relevant to our item writing efforts, we first provide a brief synthesis of the literature on 'universal design for assessment' before we describe the actual methods used in item creation, piloting, and evaluation. Universal Design for Assessment Universal Design for Assessment (UDA) is an approach to creating assessments in which test developers try to make their measures accessible to the widest possible population of students by incorporating design features that will reduce the barriers to students being able to interface successfully with the test items. In creating our mathematics items, we referred to both the National Center on Educational Outcomes' A State Guide to the Development of Universally Designed Assessment (Johnstone, Altman, & Thurlow, 2006) and the Test Accessibility and VOCABULARY CBM p. 2 Modification Inventory by Beddow, Kettler, and Elliott (2008). Assessments that are universally designed encourage testing conditions that are accessible and fair to students with special needs as well as to those in the general education population. Universally designed assessments should: (a) measure true constructs while eliminating irrelevant ones, (b) recognize the diversity of the test-taker population, (c) be both concise and clear in their language, (d) have clear format and visual information, and (e) include the ability to change formatting without compromising the meaning or difficulty of the assessment results. Universally designed assessments aim to provide valid interpretation of all test-takers' abilities and skills, including those with disabilities (Johnstone, Altman, & Thurlow, 2006). The principles of universal design for assessment guided our item creation efforts. In addition, we sought to reduce the cognitive complexity of our items by reducing the language and working memory load required to answer the test questions, and by consciously attempting to reduce the chance that extraneous information provided in the question stem or answer choices would confuse students. Our goal was to create vocabulary items that would be appropriate for use with students with a wide range of ability in the targeted construct as well as for English language learners. Germane to our work here, it is important to emphasize that in an RTI model students are expected to be assessed on grade-level content standards, but the achievement standards set for students receiving intensive intervention assistance may not be as high as those set for students from the general education population. Thus, in developing our vocabulary item bank, we sought to create items that would appropriately target the grade-level content standards yet would do so in a way that would render them accessible to a wider range of student ability than might be VOCABULARY CBM p. 3 typically expected of assessment items. Our focus on reducing the cognitive and linguistic complexity of items as well as on designing the computer interface and features of the items themselves to reduce the impact of construct irrelevant barriers to student understanding was intended to provide a bank of items from which we could draw vocabulary problems representing a wide range of difficulty yet all aligned to grade-level content standards. Methods In this technical report, we explain the development of vocabulary CBMs designed for use with students in grades 2-8. This development included three key steps: (a) creating an item bank, (b) piloting all items in the item bank to determine their difficulty, reliability, and appropriateness for use with the intended grade level, and (c) organizing the items into a series of benchmark and progress monitoring assessments. We begin by describing the process of item
Vocabulary Assessment: Making Do With What We Have While We Create the Tools We Need
Vocabulary Instruction: Research to Practice (2nd ed), 2012
In this chapter, we argue that vocabulary assessment is grossly underdeveloped, both in its theoretical and practical aspects. We examine the literature— research, common practices, and theoretical analyses—on vocabulary assessment to answer three questions: (a) what do vocabulary assessments (both past and current) measure? (b) What could vocabulary assessments measure, as illustrated by conceptual frameworks, and what development and validation efforts are needed to make such assessments a reality? And (c) What vocabulary assessments should teachers use, create, or modify while we wait for the research that is needed for wide-scale change?
The updated Vocabulary Levels Test
ITL - International Journal of Applied Linguistics, 2017
The Vocabulary Levels Test (Nation, 1983; Schmitt, Schmitt, & Clapham, 2001) indicates the word frequency level that should be used to select words for learning. The present study involves the development and validation of two new forms of the test. The new forms consist of five levels measuring knowledge of vocabulary at the 1000, 2000, 3000, 4000, and 5000 levels. Items for the tests were sourced from Nation’s (2012) BNC/COCA word lists. The research involved first identifying quality items using the data from 1,463 test takers to create two equivalent forms, and then evaluating the forms with the data from a further 250 test takers. This study also makes an initial attempt to validate the new forms using Messick’s (1989, 1995) validity framework.
McLean, S., & Kramer, B. (2015). The Creation of a New Vocabulary Levels Test. Shiken, 19(2), 1-11.
This paper describes a new vocabulary levels test (NVLT) and the process by which it was written, piloted, and edited. The most commonly used Vocabulary Levels Test (VLT) (Nation, 1983, 1990; Schmitt, Schmitt, & Clapham, 2001), is limited by a few important factors: a) it does not contain a section which tests the first 1,000-word frequency level; b) the VLT was created from dated frequency lists which are not as representative as newer and larger corpora; and c) the VLT item format is problematic in that it does not support item independence (Culligan, 2015; Kamimoto, 2014) and requires time for some students to understand the directions. To address these issues, the NVLT was created, which can be used by teachers and researchers alike for both pedagogical and research-related purposes.
Vocabulary Assessment: What We Know and What We Need to Learn
Reading Research Quarterly, 2007
The authors assert that, in order to teach vocabulary more effectively and better understand its relation to comprehension, we need first to address how vocabulary knowledge and growth are assessed. They argue that “vocabularly assessment is grossly undernourished, both in its theoretical and practical aspects—that it has been driven by tradition, convenience, psychometric standards, and a quest for economy of effort rather than a clear conceptualization of its nature and relation to other aspects of reading expertise, most notably comprehension.”
Evaluation of an Achievement English Vocabulary Test Using Rasch Analysis
The Rasch model has recently been used in educational measurement as an evaluative tool. Rasch analyses have been shown to map onto the six aspects of Messick’s (1989) construct validity and compared to a more classic model of test theory and deterministic analysis measures, make stronger arguments in providing validity evidence for tests. The Rasch model estimates the probability of a specific response according to person ability and item difficulty parameters, placing both on an interval scale. In the current study, an 83 item multiple-choice English vocabulary achievement test was administered to second-year non-English majors at a Japanese university. The test was developed from a 250 word study list. The results were analysed using a combination of Rasch measures and deterministic statistics, including logistic regression. The analyses highlighted several test items that exhibited unusual response patterning and suggested that the test was not an effective tool in measuring how well the students’ acquired the 250 words on their study list. Deterministic parametric and Rasch analyses were both effective as evaluative tools, although Rasch produced more precise information that can subsequently be used by test developers or educators to revisit potentially problematic test items, ultimately improving the validity of the test.