What does your test measure (original) (raw)

Measurement theory in language testing: Past traditions and current trends

2009

A good test is one that has at least three qualities: reliability, or the precision with which a test measures what it is supposed to measure; validity, i.e., if the test really measures what it is supposed to measure, and practicality, or if the test, no matter how sound theoretically, is practicable in reality. These are the sine qua non for any test including tests of language proficiency. Over the past fifty years, language testing has witnessed three major measurement trends: Classical Test Theory (CTT), Generalizability Theory (G-Theory), and Item Response Theory (IRT). This paper will provide a very brief but valuable overview of these trends. It will then move onto a brief consideration of the most recent notion of Differential Item Functioning (DIF). It will finally conclude that the material discussed here is applicable not only to language tests but also to tests in other fields of science.

Voices From Test-Takers: Further Evidence for Language Assessment Validation and Use

Educational Assessment, 2011

Test-takers' interpretations of validity as related to test constructs and test use have been widely debated in large-scale language assessment. This study contributes further evidence to this debate by examining 59 test-takers' perspectives in writing large-scale English language tests. Participants wrote about their test-taking experiences in 300 to 500 words, focusing on their perceptions of test validity and test use. A standard thematic coding process and logical cross-analysis were used to analyze test-takers' experiences. Codes were deductively generated and related to both experiential (i.e., testing conditions and consequences) and psychometric (i.e., test construction, format, and administration) aspects of testing. These findings offer test-takers' voices on fundamental aspects of language assessment, which bear implications for test developers, test administrators, and test users. The study also demonstrated the need for obtaining additional evidence from test-takers for validating large-scale language tests.

Validity and Classroom Language Testing: A Practical Approach

Colombian Applied Linguistics Journal, 2020

Validity and validation are common in large-scale language testing. These topics are fundamental because they help stakeholders in testing systems make accurate interpretations of individuals’ language ability and related ensuing decisions. However, there is limited information on validity and validation for classroom language testing, for which interpretations and decisions based on curriculum objectives are paramount, too. In this reflection article, I provide a critical account of these two issues as they are applied in large-scale testing. Next, I use this background to discuss and provide possible applications for classroom language education through a proposed approach for validating classroom language tests. The approach comprises the analyses of curriculum objectives, design of test specifications, analysis of test items, professional design of instruments, statistical calculations, cognitive validation and consequential analyses. I close the article with implications and recommendations for such endeavours and highlight why they are fundamental for high-quality language testing systems in classroom contexts.

Construct in assessments of spoken language

Handbook of language assessment across modalities, 2022

Language assessment constructs conjoin two complex, dynamic phenomena: (1) collective patterns of language use and (2) individual, language ability. Assessment constructs emerge from “spheres of activity” across multiple, overlapping dimensions, denoted in this chapter as theoretical, operationalized, stated and perceived constructs. While theoretical constructs are assumptions about what causes differences in scores, the operationalised construct is what actually emerges in the interaction between the assessee and the assessment infrastructure. Stated constructs are descriptions of what the assessment claims to assess and perceived constructs are the ways these statements are interpreted. Interrogating the congruence of these dimensions has the potential to provide a holistic view of the development, experience, use and impact of assessment constructs across diverse stakeholder worlds.

Two Intervierws on Language Testing: An Introduction

Issues in Applied Linguistics, 2001

The following section contains interviews with noted language testing experts Charles Alderson and Dorry Kenyon, whom we had the chance to interview during the Fourth Annual Southern California Association for Language Assessment Research (SCALAR) Conference held in Los Angeles on May 11-12, 2001. The theme of the conference was "Foreign Language Assessment at School and College Levels," an area in which both Alderson and Kenyon have much experience and insight. They provide complementary perspectives on issues in language testing because of their varied backgrounds, research interests, and the different test development and research projects with which they have been involved. Alderson, an applied linguist by training, is a professor of applied linguistics at Lancaster University. He has done a great deal of work on both theoretical and Issues in Applied Linguistics

[2019] Degrees of adequacy: the disclosure of levels of validity in language assessment

Koers, 2019

The conceptualization of validity remains contested in educational assessment in general, and in language assessment in particular. Validation and validity are the subjective and objective sides of the process of building a systematic argument for the adequacy of tests. Currently, validation is conceptualized as being dependent on the validity of the interpretation of the results of the instrument. Yet when a test yields a score, that is a first indication of its adequacy or validity. As the history of validity theory shows, adequacy is further disclosed with reference to the theoretical defensibility ("construct validity") of a language test. That analogical analytical disclosure of validity is taken further in the lingually analogical question of whether the test scores are interpretable, and meaningful. This paper will illustrate these various degrees of adequacy with reference mainly to empirical analyses of a number of tests of academic literacy, from preschool level tests of emergent literacy, to measurements of postgraduate students' ability to cope with the language demands of their study. Further disclosures of language test design will be dealt with more comprehensively in a follow-up paper. Both papers present an analysis of how such disclosures relate to a theoretical framework for responsible test design.

Critical review of validation models and practices in language testing: their limitations and future directions for validation research

Language Testing in Asia, 2019

Purpose and background: The purpose of this paper is to critically review the traditional and contemporary validation frameworks-the content, criterion, and construct validations; the evidence-gathering; the socio-cognitive model; the test usefulness; and an argument-based approach-as well as empirical studies using an argument-based approach to validation in high-stakes contexts to discuss the applicability of an argument-based approach to validation. Chapelle and Voss (2014) reported that despite the usefulness and advantages of an argument-based approach for test validation, five validation studies using this approach were found in a search from two major journals-Language Testing and Language Assessment Quarterly. We reviewed the validation approaches in language testing and extended the search for empirical studies that used an argument-based approach in five language testing journals including ProQuest Dissertation and Theses. By doing so, this paper aims to provide validation researchers with each approach's conceptual limitations and future directions for validation research. For validity arguments to be defensible, this paper suggests that various validity evidences be required, involving multiple test stakeholders. Implications: By comparing variations of an argument-based approach and reviewing eight representative studies out of 33 empirical validation studies using an argument-based approach, this paper presents the following implications for future researchers to consider: (a) defining test constructs and relevant test tasks through domain analysis; (b) inviting multiple test stakeholders to test validation; (c) investigating the intended and actual interpretations, decisions, and consequences; (d) considering social, cultural, and political values to be embedded; and (e) employing multiple methods beyond statistical analyses using test scores.

What does your test measure (original) (raw)

Related papers