Carla M. Evans | University of New Hampshire (original) (raw)

Publications by Carla M. Evans

Practical Assessment, Research, and Evaluation, 2023

Large-scale performance assessment programs are a longstanding reform tool. However, standard set... more Large-scale performance assessment programs are a longstanding reform tool. However, standard setting can be a challenge for assessment programs that use primarily non-standardized assessments. The purpose of this paper is to extend this field of research by explaining the standard setting methodology applied to one more recent instantiation of a state performance assessment program. The second purpose of this paper is to discuss the data quality control and quality assurance challenges experienced after five years of applying the standard setting method. Recognizing the burgeoning interest again in large-scale performance assessment programs, the goal and intended contribution of this paper is to inform future decisions about selecting appropriate standard setting methods and dealing with unanticipated challenges that may arise during implementation based upon the lessons learned from one program. It is likely that other large-scale performance assessment programs may face similar operational challenges, especially those that do not rely on standardized tests or standardized administration procedures to produce annual determinations of student proficiency or other scores used for accountability purposes. Assessment system designers can use the insights in this paper to consider standard setting methods and how those methods may need to be adapted to promote technical quality.

Applied Measurement in Education, 2023

Previous writings focus on why centering assessment design around students' cultural, social, and... more Previous writings focus on why centering assessment design around students' cultural, social, and/or linguistic diversity is important and how performancebased assessment can support such aims. This article extends previous work by describing how a culturally responsive classroom assessment framework was created from a culturally responsive education (CRE) pedagogical framework. The goal of the framework was to guide the design and evaluation of curriculumembedded, classroom performance assessments. Components discussed include: modification of evidence-centered design processes, teacher and/or student adaptation of construct irrelevant aspects of task prompts, addition of cultural meaningfulness questions to think alouds, and revision of task quality review protocols to promote CRE design features. Future research is needed to explore the limitations of the framework applied, and the extent to which students perceive the classroom summative assessments designed do indeed allow them to better show all they know and can do in ways related to their cultural, social, and/or linguistic identities.

Practical Assessment, Research, & Evaluation, 2022

There is renewed interest around including performance assessments in state and local assessment ... more There is renewed interest around including performance assessments in state and local assessment systems to spur positive changes in classroom instruction and student learning. Previous research has identified the external conditions that mediate the role of assessment in changing instructional practices. We extend that work by focusing on the internal classroom conditions that support improvements in student learning. We identified six key instructional practices from three teacher quality frameworks that may result from policy changes that include complex, performance-based assessments. For each practice, we explored the bidirectional relationships among the instructional core of students, teachers, and content. We argue that altering these relationships requires teachers and students to have both the disposition and the capacity to change, and we identify the assumptions that need to hold in order for those changes to occur in response to the inclusion of performance assessments in state and/or local assessment systems.

Journal of Competency-Based Education, 2021

Background Competency‐based education (CBE) is a systems‐change approach intended to re‐shape tra... more Background
Competency‐based education (CBE) is a systems‐change approach intended to re‐shape traditional understandings of what, when, where, and how students learn and demonstrate academic knowledge and skills. Research on the factors that affect K‐12 CBE implementation and the efficacy of different approaches has not yet been meticulously reviewed.

Aims
The purpose of this literature review was to examine the research on K‐12 CBE for factors that affect implementation, student outcomes, and the relationship between implementation and student outcomes.

Methods
A systematic literature review was conducted that included 25 peer‐reviewed studies and unpublished reports from 2000 to 2019 related to K‐12 students.

Results
Facilitators and barriers that affect K‐12 CBE implementation were fairly consistent across studies. Factors perceived as barriers in some contexts were viewed as facilitators in others—it all depended on stage of implementation. Findings about the outcomes of CBE for K‐12 students reflected mixed results with respect to claims that CBE implementation supports (a) academic achievement and progress; (b) intrinsic motivation and engagement; and (c) other important academic outcomes.

Discussion
Undergirding all findings in this review is the difficulty of isolating the research on implementation and outcomes of K‐12 CBE approaches in some “pure form.” It may make more sense for the field to coalesce around a common continuum of practices in relation to the key elements of CBE from more traditional models to more competency‐based models. Also, assessment as a key feature of CB implementation was absent from most of the studies reviewed which is notable given that determining competence is fundamentally an assessment decision. Directions for future research are discussed.

Conclusion
For many, the promise of CBE and related practices is that student achievement will improve and minimize equity gaps. This systematic review serves to amplify what is known about CBE approaches and what still needs investigation.

There is a growing interest in conceptualizing, defining, and assessing what are often called 21s... more There is a growing interest in conceptualizing, defining, and assessing what are often called 21st century skills or deeper learning competencies. The purpose of this literature review is to explore the conceptualizations, definitions, and understandings in the research literature related to collaboration. Key initial questions include: What is collaboration? How is collaboration related to other success skill concepts? And to what extent does collaboration develop over time? This foundational information will then be used to examine (a) instructional approaches to promote collaboration, (b) benefits of collaboration on valued student outcomes such as student learning, and (c) ways teachers can collect evidence that reveals the benefits of student collaborative outcomes using student artifacts and other appropriate measures.

There is a growing interest in conceptualizing, defining, and assessing what are often called 21s... more There is a growing interest in conceptualizing, defining, and assessing what are often called 21st century skills or deeper learning competencies. The purpose of this literature review is to explore the conceptualizations, definitions, and understandings in the research literature related to critical thinking. Key initial questions include: What is critical thinking? How is critical thinking related to other success skill concepts? And to what extent does critical thinking develop over time? This foundational information will then be used to examine (a) instructional approaches to promote critical thinking, (b) benefits of critical thinking on valued student outcomes such as student learning, and (c) ways teachers can collect evidence that reveals the benefits of student critical thinking outcomes using student artifacts and other appropriate measures.

Centerline Blog, 2019

https://www.nciea.org/blog/what-do-i-need-know-about-competency-based-grading

Centerline Blog, 2019

https://www.nciea.org/blog/education-policy/what-do-i-need-know-about-competency-based-grading

This paper presents a vision and discusses requirements for balanced systems of assessments that ... more This paper presents a vision and discusses requirements for balanced systems of assessments that can support competency-based education models with the ultimate goal of advancing educational equity for all students. It is written for state leaders interested in creating better systems of assessments that are aligned with competency-based education and that support equity goals. Our equity analysis is rooted in the National Equity Project’s definition of educational equity and in the Aurora Institute’s Designing for Equity framework. In this paper, we address the following questions: 1) How might a balanced assessment system support competency-based education, and what are the requirements for such an assessment system? 2) As district and state leaders transform educational models to support competency-based learning, what role could assessment play at each of these levels to advance important equity goals? 3) What are the barriers and levers in districts and in states to build and sustain systems of assessments that support competency-based education over the short- and long-term?

NASSP Bulletin, 2019

This exploratory study uses data from 413 principals to examine whether and how competency-based ... more This exploratory study uses data from 413 principals to examine whether and how competency-based education has been implemented in the Northeast states and the extent to which there is variation in implementation between states with different policies. Results suggest that competency-based practices that are most similar to current practices are reported more and practices that diverge from current practices are reported less. There were statistically significant differences between states with "advanced" and nonexistent competency-based education policies on three measures. Secondary principals could use this study to understand key features of the reform and the likely barriers and challenges to implementation regardless of their state policy context.

Teachers College Record, 2019

Background/Context: Educational researchers frequently study the impact of treatments or interven... more Background/Context: Educational researchers frequently study the impact of treatments or interventions on educational outcomes. A critical aspect of such investigations involves determining whether treatment effects vary by student subgroups, such as race/ethnicity, sex/gender, SES, and disability status. However, estimation of intervention effects for subgroups of students defined by disability status can be potentially misleading when researchers control for prior achievement or other measures of academic ability. Estimating intervention effects for students with disabilities is further complicated by the fact that disability status is often defined and measured by whether a student has an Individualized Education Program (IEP), masking important variation in abilities related to academic achievement and services received.

Purpose/Objective/Research Question/Focus of Study: This paper describes methodological challenges in estimating effects of educational interventions for students with disabilities, provides an applied example using data from an innovative state-level educational intervention, and concludes with implications for policy and practice.

Research Design: The analyses presented here come from a larger secondary analysis evaluating the impact of an innovative state assessment and accountability program, New Hampshire’s Performance Assessment of Competency Education (PACE) pilot program (2014-2016), on eighth-grade student academic achievement.

Findings/Conclusions/Recommendations: The estimated effects of the PACE pilot program on eighth-grade student achievement for students with and without disabilities differ depending on whether prior academic achievement is included as a control variable. Controlling for prior academic achievement, we found that the PACE program narrowed or even reversed the achievement gap between students with and without disabilities. When prior achievement was not included, the achievement gap was attenuated but not reversed. Further investigation revealed limited overlap in the distributions of prior achievement for students with and without disabilities, impacting estimates of program effects when prior achievement is controlled. Consistent with other studies, this study employed a dichotomous measure of disability (IEP vs. no IEP). However, the dichotomization of students by disability status defined by whether they have an IEP conceals important variability in cognitive skills related to achievement and thus in understanding the impact of educational interventions. We recommend that researchers investigating the impact of large educational interventions place more emphasis on understanding the impact of these programs for students with disabilities. Importantly, our work underscores problematic analytic and interpretive issues that can ensue when students from all disability groups are grouped together.

Education Policy Analysis Archives, 2019

New Hampshire’s Performance Assessment of Competency Education (PACE) pilot received a waiver fro... more New Hampshire’s Performance Assessment of Competency Education (PACE) pilot received a waiver from federal statutory requirements related to state annual achievement testing starting in the 2014-15 school year. PACE is considered an “innovative” assessment and
accountability system because performance assessments are used to help determine student proficiency in most federally required grades and subjects instead of the state achievement test. One key criterion for success in the early years of the PACE innovative assessment system is “no harm” on the statewide accountability test. This descriptive study examines the effect of PACE on Grades 8 and 11 mathematics and English language arts student achievement during the first three years of
implementation (2014-15, 2015-16, and 2016-17 school years) and the extent to which those effects vary for certain student subgroups using results from the state’s accountability tests (Smarter
Balanced and SATs). Findings suggest that students in PACE schools tend to exhibit small positive effects on the Grades 8 and 11 state achievement tests in both subjects in comparison to students
attending non-PACE comparison schools. Lower achieving students tended to exhibit small positive differential effects, whereas male students tended to exhibit small negative differential effects.
Implications for research, policy, and practice are discussed.

The purpose of this study was to test methods to strengthen the comparability claims about annual... more The purpose of this study was to test methods to strengthen the comparability claims about annual determinations of student proficiency in English language arts, math, and science (grades 3-12) in the New Hampshire Performance Assessment of Competency Education (NH PACE) pilot project. First, we examined the literature in order to define comparability outside the bounds of strict score interchangeability and explored methods for estimating comparability that support a balanced assessment system for state accountability such as the NH PACE pilot. Second, we applied two strategies—consensus scoring and a rank-ordering method—to estimate comparability in Year 1 of the NH PACE pilot based upon the expert judgment of 85 teachers using 396 student work samples. We found the methods were effective for providing evidence of comparability and also detecting when threats to comparability were present. The evidence did not indicate meaningful differences in district average scoring stringency and leniency in scoring and therefore did not support adjustments to district-level cut scores for “annual determinations.” The paper concludes with a discussion of the technical challenges and opportunities associated with innovative, balanced assessment systems in an accountability context.

This study investigates the predictive validity and policy impact of Council for Accreditation of... more This study investigates the predictive validity and policy impact of Council for Accreditation of Educator Preparation minimum admission requirements in Standard 3.2 on teacher preparation programs (TPPs), their applicants, and the broader field of educator preparation. Undergraduate GPA and GRE scores from 533 program graduates in one master’s-level TPP were examined for their ability to predict graduate GPA and the effect minimum admissions criteria had on enrollment. Findings indicate that only undergraduate GPA is moderately related to a program graduate’s success, controlling for student background characteristics. The study also finds that implementing GRE scores as a criterion in admissions decisions significantly reduces the number of admitted candidates so that the program may no longer be financially sustainable. These findings suggest many negative consequences may result from minimum admission requirements and more research is needed to evaluate the potential impact on other TPPs, teacher labor markets, and student learning outcomes.

The purpose of this study is to critically evaluate value-added accountability measures currently... more The purpose of this study is to critically evaluate value-added accountability measures currently enacted in the United States at the federal and state levels to assess teacher preparation programme (TPP) e ectiveness. We draw on Newton and Shaw’s framework for the evaluation of testing policy to evaluate the technical quality and social acceptability of using K-12 student test scores to assess TPP e ectiveness. Through six guiding questions, we examine the assumptions and arguments that support value-added assessment, culminating in overall judgments about the acceptability of implementing (or continuing to implement) the testing policy. Findings suggest policy-makers may have more pragmatic concerns about the e ciency of value-added assessment, while TPPs may have more theoretical concerns about the validity of value-added assessment. The relevance of this evaluation approach to improving policy-related decision-making will also be discussed.

Summative performance assessments in teacher education, such as the Performance Assessment for Ca... more Summative performance assessments in teacher education, such as the Performance Assessment for California Teachers (PACT) and the edTPA, have been heralded through polices intended to enhance the quality of the teaching profession and raise its stature among other professions. However, the development and implementation of the PACT, and subsequently the edTPA, have not been without controversy and debate. The purpose of this article is to assess the implementation, impact, and evolution of the PACT and edTPA. To do so, we review the growing body of literature on the impact and implementation of the PACT and critically analyze the state policies surrounding the edTPA. We raise questions about policy and practical implications of the evolution of the PACT and edTPA.

Practical Assessment, Research, and Evaluation, 2023

Applied Measurement in Education, 2023

Practical Assessment, Research, & Evaluation, 2022

Journal of Competency-Based Education, 2021

Methods
A systematic literature review was conducted that included 25 peer‐reviewed studies and unpublished reports from 2000 to 2019 related to K‐12 students.

There is a growing interest in conceptualizing, defining, and assessing what are often called 21s... more There is a growing interest in conceptualizing, defining, and assessing what are often called 21st century skills or deeper learning competencies. The purpose of this literature review is to explore the conceptualizations, definitions, and understandings in the research literature related to critical thinking. Key initial questions include: What is critical thinking? How is critical thinking related to other success skill concepts? And to what extent does critical thinking develop over time? This foundational information will then be used to examine (a) instructional approaches to promote critical thinking, (b) benefits of critical thinking on valued student outcomes such as student learning, and (c) ways teachers can collect evidence that reveals the benefits of student critical thinking outcomes using student artifacts and other appropriate measures.

Centerline Blog, 2019

https://www.nciea.org/blog/what-do-i-need-know-about-competency-based-grading

Centerline Blog, 2019

https://www.nciea.org/blog/education-policy/what-do-i-need-know-about-competency-based-grading

NASSP Bulletin, 2019

Teachers College Record, 2019

Education Policy Analysis Archives, 2019

This study examines the reliability of generalization from a collection of classroom assessments ... more This study examines the reliability of generalization from a collection of classroom assessments intended to measure student achievement to the universe of all possible assessments. It also determines an efficient number of classroom assessments necessary to ensure high reliability of estimates of student achievement made in a school accountability context.

The Every Student Succeeds Act of 2015 authorizes a pilot program that allows up to seven states ... more The Every Student Succeeds Act of 2015 authorizes a pilot program that allows up to seven states to develop innovative assessment and accountability systems. Prior to the official pilot program launch, one pilot program has been approved by the U.S. Department of Education—New Hampshire’s Performance Assessment of Competency Education (PACE). The PACE pilot was granted a 2-year waiver (2014-2016) from federal statutory requirements related to state annual achievement testing and has been granted an additional 1-year waiver. The purpose of this study is to investigate the average treatment effect of the PACE pilot on 8th grade student achievement outcomes in math. This study also examines the extent to which those average treatment effects vary according to student characteristics and between PACE schools. PACE students are compared to demographically similar non-PACE comparison students using propensity score methods. Multi-level modeling was then used to estimate the average treatment effect for students receiving either one or two years of treatment. Findings from this study suggest that the PACE pilot is having a positive effect on 8th grade student achievement outcomes in mathematics starting in Year 2. Findings also suggest that students with disabilities that attend PACE schools tend to exhibit higher achievement in comparison to students with disabilities in the comparison group. Policy implications are discussed.

Multiple needs and responses have driven the move to personalized and competency-based learning s... more Multiple needs and responses have driven the move to personalized and competency-based learning systems, including the desire to enhance the learning outcomes for all students by creating contexts for students to engage in and take control of their own learning. Some might immediately think that such personalization would be a barrier to assessment design. For an extreme example, imagine if every student was pursuing a different learning path. While it might not be very efficient to create a slightly or radically different assessment for each student, as long as the learning targets are explicit, measurement specialists should be able to design appropriate assessments to document what students have learned related to what they were expected to learn. We are able to draw on well-established procedures for assessment design that would not necessarily change for the more personalized case. The real challenge arises, however, when there is a desire or need to include the results from such assessments in school or educator accountability systems.

Ensuring that the rules used to produce the accountability results are fair to the various types of individuals or entities subject to the accountability system is a key tenet of accountability system design. Fairness is manifest, among other ways, by holding people (or schools) to comparable achievement standards using the same or similar types of data, transformations of the data into indicators, and criteria to judge the value of the indicators. More succinctly, “comparability” is viewed by many as an important aspect of designing fair accountability systems. On first glance, personalization and comparability appear to be at odds and, if each is taken strictly, they are. However, in most cases, personalization, at least in K-12 public schools, does not mean complete freedom to choose any possible learning path. In almost all cases, the pace of learning is usually personalized with somewhat less freedom to choose the content to be studied. Nevertheless, even this more limited level of comparability may pose significant accountability challenges.
This paper addresses the range of personalized learning that we might expect in current K-12 systems, but focuses primarily on understanding the ways in which comparability may be considered to help bridge the apparent divide between fair accountability systems and personalized learning. For example, if comparability is defined as strict psychometric interchangeability, it is doubtful that such personalized systems could meet such a threshold. On the other hand, if comparability is evaluated by the ways in which the assessment results predict or support similarly rigorous outcomes, the door may be open to incorporating the results of personalized and competency-based assessments into accountability systems. This paper presents a conceptualization of comparability that is less stringent than interchangeability of student scores. We then present the results of applying such a perspective to a competency-based pilot project where the school-based results must still be used in school accountability determinations. The paper concludes with a discussion of the challenges and opportunities associated with trying to use the results of personalized and competency-based learning systems in large-scale accountability systems.

This presentation begins by setting the context of the NH PACE project as a response to the test-... more This presentation begins by setting the context of the NH PACE project as a response to the test-based accountability strategies that have operated since the passage of No Child Left Behind in 2002. Many argue that new accountability systems need to be designed that support meaningful learning and continuous improvement models (Dance, 2015; Darling-Hammond, Wilhoit, & Pittenger, 2014; Marion & Leather, 2015; Turnipseed & Darling-Hammond, 2015). However, what commitment, collaboration, and capacity is necessary of district leadership and personnel to implement an alternative accountability model?

Using a semi-structured interview protocol, we interviewed administrators from each PACE district to better understand their experience in implementing the PACE project during the 2014-2015 school year. Interviews were conducted in groups and lasted approximately one hour. Questions during the interview centered around five main topics: overall reflections; board, district and school-level commitment; cross-district collaboration; development, administration and scoring of performance assessments; communication with local stakeholders; and district capacities necessary to successfully implement PACE. This presentation—organized around these five themes—summarizes the findings and highlights differences across districts (where applicable) in order to provide insight into the experiences of implementing districts.

The feedback provided by the four first-year pilot districts on their experiences implementing the PACE pilot project can serve multiple purposes. The NHDOE and its partners can use the feedback for formative purposes and program improvement. Additionally, the four districts implementing the pilot in the 2015-2016 school year may be able to glean useful information to help them more successfully navigate the demands of the project. The district feedback provided in this policy brief may also serve as useful information for other states, districts, and/or schools interested in designing an alternative accountability model.

Increased accountability currently permeates every level of educational discourse, including thos... more Increased accountability currently permeates every level of educational discourse, including those that focus on teacher education. As a result, significant changes are taking place in assessing the competence of preservice teachers for initial licensure/certification. The purpose of this analysis is to examine contemporary teacher education policies and reform agendas in teacher education around preservice performance assessments. Using two components of Cochran-Smith, Piazza & Power’s (2013) “politics of policy” framework, the discourses and arguments surrounding performance assessments are analyzed over a broad period of time. Findings suggest a significant pairing of discourses such as audit culture, neoliberalism, human capital theory, standards/accountability, and outcomes with the historical development of performance assessments in teacher education. Findings also suggest three arguments used by competing agendas in teacher education to argue for performance assessments: to assure the competence of preservice teachers, to assess the quality of teacher education programs, and demonstrate a positive impact on student learning. The analysis has important implications for states and/or teacher education programs considering mandating performance assessments for initial licensure or program approval purposes, as well as for developers of the ‘next generation’ of performance assessments.

Competency-based education (CBE) reform has become a priority in many local and state education a... more Competency-based education (CBE) reform has become a priority in many local and state education agencies in the United States. An oft-cited goal of CBE is to reduce inequities in student achievement outcomes and achievement gaps while improving the overall quality of education. The purpose of this study was to construct a reliable instrument to measure K-12 CBE implementation at the school level. This article describes our instrument development process including construct validation and reliability testing with 413 public school principals. This study employed confirmatory factor analysis and Cronbach's alpha internal consistency estimates to examine the construct validity and reliability of the pilot administration of the CBE Implementation Survey for Principals. Results suggest that the survey instrument accurately and reliably measures the essential elements of CBE, providing initial support for use in evaluating K-12 CBE implementation. Implications for research, policy, and practice are discussed.