The Development of CBM Vocabulary Measures: Grade 3. Technical Report #1210 (original) (raw)
2012, Behavioral Research and Teaching
In this technical report, we describe the development and piloting of a series of vocabulary assessments intended for use with students in grades two through eight. These measures, available as part of easyCBM TM , an online progress monitoring and benchmark/screening assessment system, were developed in 2010 and administered to approximately 1200 students per grade from schools across the United States in the spring of 2011 using a common item design to allow all items to be estimated on the same scale within each grade level. We analyzed the results of the piloting using a one parameter logistic (1PL) Rasch analysis. Because the results of these analyses are quite lengthy, we present the results for each grade's analysis in its own technical report, all sharing a common abstract and introduction but unique methods, results, and discussion sections. VOCABULARY CBM p. 1 The Development of CBM Vocabulary Measures: Grade 3 (Technical Report 1210) CBM assessments are a key component of many school improvement efforts, including the Response to Intervention (RTI) approach to meeting students' academic needs. In an RTI approach, teachers first administer a screening or benchmarking assessment to identify students who need supplemental interventions to meet grade-level expectations, then use a series of progress monitoring measures to evaluate the effectiveness of the interventions they provide the students. When students fail to show expected levels of progress (as indicated by 'flat line scores' or little improvement on repeated measures over time), teachers use this information to help them make instructional modifications with the goal of finding an intervention or combination of instructional approaches that will enable each student to make adequate progress toward achieving grade level proficiency and mastering content standards (McMaster, Fuchs, D., Fuchs, L.S., & Compton). In such a system, it is critical to have reliable measures that assess the target construct and are sensitive enough to detect improvement in skill over short periods of time. Because the term is relevant to our item writing efforts, we first provide a brief synthesis of the literature on 'universal design for assessment' before we describe the actual methods used in item creation, piloting, and evaluation. Universal Design for Assessment Universal Design for Assessment (UDA) is an approach to creating assessments in which test developers try to make their measures accessible to the widest possible population of students by incorporating design features that will reduce the barriers to students being able to interface successfully with the test items. In creating our mathematics items, we referred to both the National Center on Educational Outcomes' A State Guide to the Development of Universally Designed Assessment (Johnstone, Altman, & Thurlow, 2006) and the Test Accessibility and VOCABULARY CBM p. 2 Modification Inventory by Beddow, Kettler, and Elliott (2008). Assessments that are universally designed encourage testing conditions that are accessible and fair to students with special needs as well as to those in the general education population. Universally designed assessments should: (a) measure true constructs while eliminating irrelevant ones, (b) recognize the diversity of the test-taker population, (c) be both concise and clear in their language, (d) have clear format and visual information, and (e) include the ability to change formatting without compromising the meaning or difficulty of the assessment results. Universally designed assessments aim to provide valid interpretation of all test-takers' abilities and skills, including those with disabilities (Johnstone, Altman, & Thurlow, 2006). The principles of universal design for assessment guided our item creation efforts. In addition, we sought to reduce the cognitive complexity of our items by reducing the language and working memory load required to answer the test questions, and by consciously attempting to reduce the chance that extraneous information provided in the question stem or answer choices would confuse students. Our goal was to create vocabulary items that would be appropriate for use with students with a wide range of ability in the targeted construct as well as for English language learners. Germane to our work here, it is important to emphasize that in an RTI model students are expected to be assessed on grade-level content standards, but the achievement standards set for students receiving intensive intervention assistance may not be as high as those set for students from the general education population. Thus, in developing our vocabulary item bank, we sought to create items that would appropriately target the grade-level content standards yet would do so in a way that would render them accessible to a wider range of student ability than might be VOCABULARY CBM p. 3 typically expected of assessment items. Our focus on reducing the cognitive and linguistic complexity of items as well as on designing the computer interface and features of the items themselves to reduce the impact of construct irrelevant barriers to student understanding was intended to provide a bank of items from which we could draw vocabulary problems representing a wide range of difficulty yet all aligned to grade-level content standards. Methods In this technical report, we explain the development of vocabulary CBMs designed for use with students in grades 2-8. This development included three key steps: (a) creating an item bank, (b) piloting all items in the item bank to determine their difficulty, reliability, and appropriateness for use with the intended grade level, and (c) organizing the items into a series of benchmark and progress monitoring assessments. We begin by describing the process of item