Language Testing Research Papers - Academia.edu (original) (raw)

This study investigates Japanese high school students’ attitudes toward English proficiency tests, specifically the Test of English for Academic Purposes (TEAP) and university entrance examinations. Three rounds of interviews with five... more

This study investigates Japanese high school students’ attitudes toward English proficiency tests, specifically the Test of English for Academic Purposes (TEAP) and university entrance examinations. Three rounds of interviews with five highly motivated learners at a prestigious high school were held over a period of 1.5 years. The interviews focused on their beliefs about English and their study methods; their impressions of and study methods for TEAP and other entrance exams; and their post-graduation plans. The interview data reveals students felt studying for TEAP provided an opportunity for authentic language study, which was in line with their high school study and would be useful for their futures. In contrast, they felt other entrance exams often focused on different skills and knowledge, making preparation for them both challenging and frustrating. These academically minded students found studying English merely for university entrance purposes to be demotivating, but a nece...

This article aimed at presenting a comprehensive overview of three interrelated concepts of washback, impact and validity in language testing and a myriad of studies conducted at different places to investigate the influence of testing on... more

This article aimed at presenting a comprehensive overview of three interrelated concepts of washback, impact and validity in language testing and a myriad of studies conducted at different places to investigate the influence of testing on teachers and teaching, textbooks, learners and learning, attitudes toward testing, test preparation behaviors, etc.. Some of these studies present the results of various investigations on the influence of a national English examination on the local English language teaching and learning due to its high-stakes nature in particular countries such as Brazil, China, Hong Kong, Iran, Israel, Japan, Romania, Sri Lanka, and Taiwan. Some others cover a wide range of worldwide investigation on English testing such as the IELTS, TOEFL, and MECC. Moreover, there is a complete report of several important projects appointed by major testing agencies such as Cambridge ESOL and Educational Testing Services (ETS) on washback and impact studies. The article proceeds by reviewing the relevant literature on test validation which is a key concept in language testing domain since it is concerned with test interpretation and use. This domain is characterized and enriched by studies of washback and impact.

The present study makes use of the Rasch model to investigate the presence of DIF between male and female examinees taking the University of Tehran English Proficiency Test (UTEPT). The results of the study indicated that 19 items are... more

The present study makes use of the Rasch model to investigate the presence of DIF between male and female examinees taking the University of Tehran English Proficiency Test (UTEPT). The results of the study indicated that 19 items are functioning differentially for the two groups. Only 3 items, however, displayed DIF with practical significance. A close inspection of the items indicated that the presence of DIF may be interpreted as impact rather than bias. Therefore, it is concluded that the presence of the differentially functioning may not render the test unfair. On the other hand, it is argued that the fairness of the test may be under question due to other factors.

Arsaythamby Veloo School of Education and Modern Languages, Universiti Utara Malaysia Malaysia ABSTRACT This study attempts to investigate whether MUET results can be used as a predictor of accounting and science stream students’ overall... more

Arsaythamby Veloo School of Education and Modern Languages, Universiti Utara Malaysia Malaysia ABSTRACT This study attempts to investigate whether MUET results can be used as a predictor of accounting and science stream students’ overall academic performance of matriculation program. This prediction will prove further whether MUET result is valid to be used as a measurement for placement in public universities. Data were analysed at three stages using SPSS version 22.0. Firstly, MUET test scores were explained descriptively for the minimum and maximum scores. Then, to identify the relationship, overall MUET scores and its individual components’ scores were correlated with students’ _Cumulative Grade_ Point_ Average (CGPA). Then, multiple regression analysis was carried out to investigate whether MUET overall scores and its components can be used to predict students’ academic achievements. The results of_ multiple_ regression analysis show that, overall MUET score and reading compone...

The article deals with the definition of ways of improving the teaching of foreign languages based on the use of distance learning technologies. The main areas of implementation of distance learning are discussed, and adaptation of... more

The article deals with the definition of ways of improving the teaching of foreign languages based on the use of distance learning technologies. The main areas of implementation of distance learning are discussed, and adaptation of existing teaching methods to the features of distance education, the use of computer testing to assess the level of foreign language proficiency, the introduction of adaptive testing to improve the efficiency of evaluation and to ensure its multicriteria in particular.
The basis of the methodology for learning distantly is to put the principle of modularity, which involves the division of educational material into separate modules, as well as the use of separate program modules for the formation of a distance learning course. This allows you to organize the learning process either in the form of a sequential study of the entire course, or by selectively passing the individual modules needed to fill the "pass" in the knowledge system of the learner.
The following algorithm for the implementation of the methodology for learning a foreign language distantly was determined:
1. Choosing (sequential or selective) way of training the foreign language moduls course.
2. Choosing the next module (in the case of successful passing go to 7).
3. Submission of educational material.
4. Exercise for the learning of the material.
5. Testing (adaptive).
6. If test result is negative go to 3,and if it is positive go to 2.
7. Final testing of language proficiency.
The preliminary assessment of the results of the use of the gradient approach to adaptive testing during the assessment of the quality of training in distance learning courses allows us to conclude that achieving a reduction in the passage of the test (on average twice) with compliance with the requirements for the adequacy of the assessment. This will ensure the rational use of the learning resource during distance learning, or increase the efficiency of the assessment by increasing the volume of the test.
Key words: foreign language learning, distance learning, language testing, adaptive testing, information technology

Abstract This paper presents a comprehensive framework for researching classroom-based assessment (CBA) processes, and is based on a detailed empirical study of two Australian school classrooms where students aged 11 to 13 were studying... more

Abstract This paper presents a comprehensive framework for researching classroom-based assessment (CBA) processes, and is based on a detailed empirical study of two Australian school classrooms where students aged 11 to 13 were studying Indonesian as a foreign ...

In the introduction, proposed several principles about assessment and testing. Furthermore, analysis part presents the main features of IELTS, as well as the strengths and weaknesses based on several criteria. Finally, several suggestions... more

In the introduction, proposed several principles about assessment and testing. Furthermore, analysis part presents the main features of IELTS, as well as the strengths and weaknesses based on several criteria. Finally, several suggestions are given in the conclusion for better improvement.

There are several student performances assessed in Intensive English Programs (IEPs) worldwide in each academic year. These student performances are mostly graded by human raters with a certain degree of error. However, the accuracy of... more

There are several student performances assessed in Intensive English Programs (IEPs) worldwide in each academic year. These student performances are mostly graded by human raters with a certain degree of error. However, the accuracy of these performance assessments is of utmost importance because they feed data into some high stakes decisions about the students and such performance assessments constitute a large number of students' scores. Therefore, the accuracy of these performance assessments should be given priority by the IEPs. However, the current rater performance monitoring systems which can help the administrators of IEPs to monitor rater performance in performance assessment are away from practicality because they require the use of complex mathematical models and specialized software. A practical and easy to maintain rater performance categorization system is proposed in this paper and it was accompanied by a sample study. Its benefits to the administrators of IEPs and their raters are also discussed besides its practical considerations.

This article presents a history of Shiken since it was first published in 1997 until 2019, followed by suggestions for areas of future research in assessment to which the publication may be well suited to contribute. In the historical... more

This article presents a history of Shiken since it was first published in 1997 until 2019, followed by suggestions for areas of future research in assessment to which the publication may be well suited to contribute. In the historical overview, data is presented about the following: the origins, titles, editors, and distribution; the article types; the contents of research articles and the design and methodologies they have employed. Regarding research article content, four prominent themes were identified: mass market tests, entrance exams, statistics, and validity/reliability. Regarding design and methods, research articles have tended to focus on English language tests with university students in Japan, while utilizing test and/or instrument data and quantitative methods of analysis. Recommendations for future research areas include investigations into the validity of test interpretations and uses of four-skills, vocabulary and other tests used in Japan, and language assessment literacy. Recommendations for future research design and methods include focusing more on a range of test stakeholders; various contexts, such as pre-tertiary education; and the use of qualitative and mixed methods.

In the present study we investigated which role manipulated (i.e., experimentally induced) and perceived (i.e., self-reported) self-con- trol depletion plays in students’ (N¼176 seventh graders) achievement-related experiences and... more

In the present study we investigated which role manipulated (i.e., experimentally induced) and perceived (i.e., self-reported) self-con- trol depletion plays in students’ (N¼176 seventh graders) achievement-related experiences and behaviour during a test of English as a foreign language, while controlling for trait self-con- trol. Our successful experimental manipulation of self-control depletion revealed that there were no effects on any of the stu- dents’ outcome variables. However, students who reported high self-control depletion immediately after the experimental manipu- lation were less motivated to work on the subsequent test, reported more distracting thoughts, showed lower performance, and felt more depleted at the end of the test session. Trait self- control turned out to be a protective and supportive factor for most of our outcome variables. Our results provide evidence that the perceived and not the manipulated level of self-control deple- tion is a predictor of achievement-related behaviour in tests on English as a foreign language.

Studies of reading that involve eye-tracking often utilise stimulated-recall interviews with participants. A new approach to stimulated-recall interviews designed to enrich participant verbalisations allows researchers to directly link... more

Studies of reading that involve eye-tracking often utilise stimulated-recall interviews with participants. A new approach to stimulated-recall interviews designed to enrich participant verbalisations allows researchers to directly link participants’ statements to specific moments of their test – accomplished by video recording their actions during test taking. The session will discuss how this approach to stimulated-recall interviews can work alongside eye-tracking to produce rich qualitative data related to individuals’ cognitive processing.

Purveying insights from a mixed-method research design, this study aims to enlighten the exploitation of the European guidelines in language testing and assessment practices in non-formal educational settings. Accordingly, three... more

Purveying insights from a mixed-method research design, this study aims to enlighten the exploitation of the European guidelines in language testing and assessment practices in non-formal educational settings. Accordingly, three non-formal English language schools renowned for quality in Turkey were taken to in-depth analysis in order to offer a general paradigm from a sample of leading professionals on the utilization of the European benchmarks in language testing and assessment practices. The results have yielded that (1) there is a need for a more practical curriculum molded with a real auditing system for the enhancement of the current language testing and assessment practices; (2) there is a request for the validation process for language certificate examinations implemented in non-formal educational settings; (3) there is a demand for cooperation amidst the allies for the standardization process in language testing and assessment practices. The results are laced with some reco...

This paper describes a new methodology for testing intelligibility across closely related languages and dialects in a traditional oral society in Vanuatu. There are many reasons why it could be useful to establish how well speakers of... more

This paper describes a new methodology for testing intelligibility across closely related languages and dialects in a traditional oral society in Vanuatu. There are many reasons why it could be useful to establish how well speakers of related varieties can understand one another: such knowledge is relevant to language planning and policy making, and it can shed light on the dynamics of language contact. However, conventional approaches to intelligibility testing, such as ‘recorded text testing’ (Hickerton, Turner & Hickerton 1952; Pierce 1952; Voegelin & Harris 1951), are time consuming to score, and difficult to implement consistently. In Europe, fast and efficient intelligibility testing has been successfully carried out across closely related varieties (cf Vanhove 2014, Author1 in preparation, Schüppert & Author1 2011 a, b, inter alia). However, these methods assume that test subjects are literate and computer-savvy. The methodology discussed in the present paper adapts European methods to conventional ‘fieldwork’ conditions. In Vanuatu we trialled a picture task and a translation task. Although some words had to be removed from the final analysis, the experiment was successful overall and we anticipate that this method can be fruitfully applied in other oral language communities.

Nell'articolo presentiamo un'analisi degli item di una prova di lettura a scelta multipla di livello B1 della certificazione CILS (Università per Stranieri di Siena). L'indagine si muove da una prima ricognizione del testo su cui si basa... more

Nell'articolo presentiamo un'analisi degli item di una prova di lettura a scelta multipla di livello B1 della certificazione CILS (Università per Stranieri di Siena). L'indagine si muove da una prima ricognizione del testo su cui si basa la prova, con uno studio delle modifiche cui è andata soggetta per mano dell'item writer, per poi ragionare sull'analisi di ogni singolo item, grazie ai dati emersi dalla somministrazione della prova a 161 studenti di italiano di livello corrispondente sparsi per il pianeta. Dalla nostra ricerca si evince che si danno un item ambiguo (# 1), per via della presenza di due chiavi, e un item di difficile risoluzione, per via della mancanza di informazioni utili per desumere il significato del vocabolo cui si riferisce (# 4). Parole chiave: didattica dell'italiano come lingua straniera; Item Analysis; testing; lettura; test a scelta multipla. Abstract: In this article we present an analysis of items in a reading multiple-choice test, B1 level, of the CILS certification (Università per Stranieri di Siena). The research starts with a preliminary recognition of the text on which the test is based, with a study of the modifications it has undergone by the item writer's hand, and proceeds to reason about

The Spanish University Entrance Exam, the Selectividad, started being used 25 years ago. Since then, only a few changes have been made to the English component. This test is designed as school-leaving examination, but, on the other hand,... more

The Spanish University Entrance Exam, the Selectividad, started being used 25 years ago. Since then, only a few changes have been made to the English component. This test is designed as school-leaving examination, but, on the other hand, is serving as a selection criterion to enter Spanish University. This test must be redesigned, new specifications should be developed, including a clear definition of contents and aims, and a new test construction protocol should be set to guarantee the quality of the test.
During the last two years I have been working on a project which objective is the development of a new test for University Entrance, based on the theories proposed by the experts. It is the desire of the project to improve the current testing system by proposing a new model and a new methodology to follow in the design of a new Selectividad.
After collecting empirical evidence of the vagueness of Selectividad results, I wrote the specifications for the new test that I have recently piloted with a group of students who have just entered the University system (those students who took the Selectividad test either in June or September). Apart from that, a second piloting took place with a group of High School students who are currently preparing for the test.
This presentation will focus on the first results of item analysis and internal reliability, at the same time that I will provide details on how the test is being validated.

This study investigates Japanese high school students’ atti- tudes toward English proficiency tests, specifically the Test of English for Academic Purposes (TEAP) and university entrance examinations. Three rounds of interviews with five... more

This study investigates Japanese high school students’ atti- tudes toward English proficiency tests, specifically the Test of English for Academic Purposes (TEAP) and university entrance examinations. Three rounds of interviews with five highly mo- tivated learners at a prestigious high school were held over a period of 1.5 years. The interviews focused on their beliefs about English and their study methods; their impressions of and study methods for TEAP and other entrance exams; and their post-graduation plans. The interview data reveals stu- dents felt studying for TEAP provided an opportunity for au- thentic language study, which was in line with their high school study and would be useful for their futures. In contrast, they felt other entrance exams often focused on different skills and knowledge, making preparation for them both challenging and frustrating. These academically minded students found studying English merely for university entrance purposes to be demotivating, but a necessary evil to achieving their im- mediate goals.
本研究は、英語力を測定する試験、具体的には TEAP(英語運用能力 測定試験)、及び大学入試に対する日本の高校生の考え方を調査してい る。1年半にわたって、著名な高校の学生で、かつ非常に学習意欲の高い 5名の学習者を対象に3回インタビューを実施した。インタビューは、生徒 の英語に対する信条、学習方法に焦点を置いた。また、TEAPに対する印 象、TEAPのための勉強方法、その他の入試、卒業後の進路についてであ った。インタビューデータは、TEAPのために学習することは、真の言語学 習の機会となり、高校での学習からそれるものではないばかりか、将来に おいても有益であろうと感じていることを示した。対照的に、その他の入 試は、しばしば異なるスキルや知識に焦点を置いていて、そのための準備 は難しく、また挫折感を抱かせると感じていた。勉強熱心なこれらの生徒 は、単に大学入試のための英語学習は動機の低下につながるものと感じ ている一方で、直近の目標を達成する必要悪であるとも考えていることが分かった。

This study investigates the interaction between lexical knowledge and listening comprehension in a second language. Fifty-nine Japanese university students of low-intermediate to advanced English ability were tested using first-language... more

This study investigates the interaction between lexical knowledge and listening comprehension in a second language. Fifty-nine Japanese university students of low-intermediate to advanced English ability were tested using first-language recall protocols as comprehension measures, and dictation as measures of lexical familiarity on four texts of increasing amounts of low-frequency lexical words. Comprehension correlated with text-lexis familiarity at .45; acceptable comprehension levels were significantly associated with higher text-lexis familiarity; good comprehension seldom occurred with text-lexis familiarity levels lower than 75 percent, but occurred frequently at 90+ percent levels. This pattern was observed equally for learners of high, middle, or low second-language listening proficiency. It is concluded that efficient listening strategies may make comprehending lexically complex texts possible, but most learners seem to need very high lexical familiarity for good comprehension.

Testing writing in achievement tests follows certain criteria and principles approved in educational assessment and language testing. This paper seeks to investigate whether the composition writing tasks in achievement tests for grade 1... more

Testing writing in achievement tests follows certain criteria and principles approved in educational assessment and language testing. This paper seeks to investigate whether the composition writing tasks in achievement tests for grade 1 and 2 in private secondary schools at locality level fulfill these criteria and principles such as how much help should be provided for the students, task contextualization and the general characteristics which ensure the usefulness of the test. Two midterm English examination papers designed and administered 2018 by the Non-governmental Education Administration in Karari Locality were selected for a case study analysis. It appears that the guide given for help lacks balance, either being too generous as for grade 1 or insufficient as in the case of grade 2. In addition, both tasks in the two tests need further contextualization, to maintain meaningfulness, as well as expansion of test authenticity and interactiveness which have been reduced significantly by excessive emphasis on test validity and reliability.

Abstract: The argument I will develop in this essay is that the foreign students are a latent human resource who can assist with overcoming English monolingualism in the Australian population. Foreign students, properly rewarded, can be a... more

Abstract: The argument I will develop in this essay is that the foreign students are a latent human resource who can assist with overcoming English monolingualism in the Australian population. Foreign students, properly rewarded, can be a major source of skills transfer. Every one of those students is a walking compendium of language and cultural skills that Australians need to know.

كتاب ISTQB-CTFL Syllabus 2018 V3

This paper describes a study examining the effects of the computer-adaptive Duolingo English Test (DET) among a small group of first and second year students in the Global Communication Department at Hiroshima Bunkyo Women's University.... more

This paper describes a study examining the effects of the computer-adaptive Duolingo English Test (DET) among a small group of first and second year students in the Global Communication Department at Hiroshima Bunkyo Women's University. This test provides a specific DET score and correlates to the widely utilized English language proficiency tests Test of English as a Foreign Language (TOEFL) and International English Language Testing System (IELTS). In addition, the DET score is aligned with the Common European Framework of Reference (CEFR). Therefore, by undergoing the DET, an examinee can gauge her English language level in a variety of milieu. In this project, qualitative data in the form of interviews and anonymous surveys as well as the quantitative data of the participants' DET scores are examined in order to ascertain the participants' English proficiency levels, as well as their motivation and confidence levels for studying English. The results indicate that utilizing this DET assessment can positively affect participants' motivation for studying English and can be a beneficial tool for tracking English language progress.

The university entrance examination in Spain is currently undergoing some changes that will be implemented in the near future. Despite the many studies conducted in the past to improve the English component, little changes to the exam... more

The university entrance examination in Spain is currently undergoing some changes that will be implemented in the near future. Despite the many studies conducted in the past to improve the English component, little changes to the exam have been done in recent years. A recent study of 3747 students who participated in the EvAU in the Comunidad de Madrid in the academic year 2020/2021 proves that some of the items in the current exam have acceptable item discrimination indices and good reliability coefficients. However, there are other components that need to be revised or removed from the test. The results of the analysis may be helpful in the design and development of the new entrance examination.

This is a test of vocabulary size and strength in four modalities (productive recall, receptive recall,productive recognition, receptive recognition). It tests samples of 14000 words at 14 frequency levels. At the end of the test you will... more

This is a test of vocabulary size and strength in four modalities (productive recall, receptive recall,productive recognition, receptive recognition). It tests samples of 14000 words at 14 frequency levels. At the end of the test you will receive a table with your results.

The optimal language for literature and educational materials is not the same for all Zay areas. The data gathered during the current study points to Zay as optimal for the islands on Lake Ziway and Oromo as optimal for the lakeshores.... more

The optimal language for literature and educational materials is not the same for all Zay areas. The data gathered during the current study points to Zay as optimal for the islands on Lake Ziway and Oromo as optimal for the lakeshores. However, the Zay people living on the islands would probably be well served by Amharic literature and educational materials until most of them immigrate to the shore or the Oromo educational system causes a shift in preference to Oromo. Zay’s case is one of an endangered language that could prove to be a development success story, but only if the level of motivation for a language development project is high enough to initiate and sustain the effort.

In this introduction, we outline the most relevant concepts for this special issue on integration and the politics of difference. This introduction characterizes " integration " as a dominant policy orientation and discursive regime... more

In this introduction, we outline the most relevant concepts for this special issue on integration and the politics of difference. This introduction characterizes " integration " as a dominant policy orientation and discursive regime concerned primarily with understandings of language, communication, and skill which constitute a (trans)national politics of difference. In various sites and national contexts of the global north, migrant " integration " policies render difference and mobility the site of both discursive elaboration and management. This introduction highlights the salience of critical ethnographic analyses for understanding " integration " beyond policy realms, arguing for attention to situated practices, emergent social categories and types, political-economic stakes, logics of linguistic (dis)engagement, and the reproduction of mono-and multilingual social orders. In particular, we propose to untangle this complex by describing three central processes that run through all of the contributions and which, we suggest, are indispensable for the analysis of current and emergent regimes of integration: processes of categorization, of selection, and of activation.

The application of English for Specific Purposes (ESP) concept in engineering studies in the 21 st century still serves as essential platform where authentic target workplace language use and language tasks could be simulated and trained... more

The application of English for Specific Purposes (ESP) concept in engineering studies in the 21 st century still serves as essential platform where authentic target workplace language use and language tasks could be simulated and trained for engineering undergraduates. This however, does not come without challenges. This study highlights the practices of English language lecturers in developing language tests for ESP courses offered in engineering programmes and the challenges faced during the process. The findings elicited via qualitative approach unearthed complex realities of lecturers' actual practices in developing ESP tests. Far from ideal conditions, they have to grapple with challenges which stem from issues at the systemic or macro level in the engineering programmes. The lecturers' practices demonstrated a host of attempts to address the layers of challenges engulfing the task of preparing good ESP tests for engineering undergraduates. They are guided by their own longstanding views on language testing that present varying degrees of conformity, ingenuity and divergence as compared to best practices in ESP tests development.

Single-Answer Multiple-choice (SAMC) test technique is one of the most commonly used objective test techniques. Most of the teachers and testing organizations prefer SAMC questions as these tests provide " high score reliability, ease of... more

Single-Answer Multiple-choice (SAMC) test technique is one of the most commonly used objective test techniques. Most of the teachers and testing organizations prefer SAMC questions as these tests provide " high score reliability, ease of administration and scoring, usefulness in testing varied content, and objective scoring " (Kurz, 1999:3). However, such questions have some disadvantages such as " decreased validity due to guessing and failure to credit partial knowledge " (Kurz, 1999:2). Many scholars provided suggestions to overcome such disadvantages. Different scoring methods for SAMC questions have been created so far. In this paper, it is claimed that using Multiple-answer Multiple-choice (MAMC) questions will also eliminate some of these disadvantages of SAMC questions. In addition, it is aimed to suggest a practical way of scoring MAMC questions. This study is significant as there is a very limited literature on the use of MAMC questions for classroom use. The practical way of scoring MAMC questions for classroom use will be an initial step and it will pave the way for the new developments in the use of MAMC questions in the language tests. This alternative way of testing and scoring MAMC questions is open to discussion. There is nothing best, but something better which will help the further developments in testing the examinees' success.

This paper reports an attempt to develop and validate a bilingual Persian version of the Vocabulary Size Test (VST). Due to the particular educational system in Iran, there is a dire need for a test that can effectively estimate English... more

This paper reports an attempt to develop and validate a bilingual Persian version of the Vocabulary Size Test (VST). Due to the particular educational system in Iran, there is a dire need for a test that can effectively estimate English learners’ vocabulary sizes. Previous research (Nguyen & Nation, 2011) has indicated that bilingual versions of the VST can be more efficient than the monolingual one. A calibration of the Persian version of the test with 190 English learners indicated that the test enjoys a high level of validity and reliability. The results of a factor analysis revealed a single construct, presumably word knowledge, is underlying the test. A one-way between-subjects ANOVA also indicated that the test can effectively distinguish between different proficiency levels. The hypothesized difficulty order was also realized in the data though it was found that clusters of 1000 word levels provide more meaningful difficulty levels as they are less susceptible to the idiosyncrasies at each 1000 level. The results were also against the common assumption in the literature that not all test takers should sit the entire test. The administration of the whole test leads to a more valid estimate of the examinees’ vocabulary sizes.