English Language Testing Research Papers (original) (raw)
2025, TESOL Quarterly
As countries around the world experience a surge of individuals who come from culturally and linguistically diverse (CLD) backgrounds, teacher education programs are reimagining ways to train content teachers with the necessary linguistic... more
As countries around the world experience a surge of individuals who come from culturally and linguistically diverse (CLD) backgrounds, teacher education programs are reimagining ways to train content teachers with the necessary linguistic awareness, pedagogical tools, and skills to support CLD students while also incorporating teachers' language ideologies. As part of a larger narrative case study, this study investigates the ways language ideologies, discourses about teaching and learning, and training coalesce to shape pre-service teachers' emerging teacher identity and their perceived role in helping multilingual learners. The findings illustrate that participants' narrated understanding of their role as teachers within Evergreen State University is entangled with their local language ideologies that shape their learning and application of theoretical and pedagogical skills to support multilingual learners. These findings present a need to shift the view of language ideologies from a priori to one that coalesces between pre-service teachers and the local environment to shape the possibilities of becoming a teacher.
2025, Nat. Volatiles & Essent. Oils,
IELTS as a testing system has been the platform that promotes immigration of people from non-English speaking countries to English speaking countries like the United States of America, the United Kingdom, Canada, Australia and New... more
IELTS as a testing system has been the platform that promotes immigration of people from non-English speaking countries to English speaking countries like the United States of America, the United Kingdom, Canada, Australia and New Zealand. Amidst the other testing systems in existence currently, such as the TOEFL iBT (Test of English as a Foreign Language internet-based test), OET (Occupational English Test), PTE-A (Pearson Test of English Academic), CAE (Cambridge English: Advanced) and Canada-specific testing systems, IELTS has a reputation around the world as it has provided the testing service for decades earlier in comparison to these relatively new testing systems that have emerged with the times. This article explores the history of IELTS and its relevance in the Indian market along with an introduction to the other testing systems in existence.
2025, rEFLections
is a required English proficiency test for all CMRU students before graduation. Despite its meticulous design, there is an opportunity for students to improve their scores through focused efforts and targeted support. This study employs... more
is a required English proficiency test for all CMRU students before graduation. Despite its meticulous design, there is an opportunity for students to improve their scores through focused efforts and targeted support. This study employs an explanatory sequential mixed-methods design, utilizing surveys and interviews, to explore student perceptions of CMRU-TEP and to propose improvement strategies to enhance test performance. Guided by Bachman and Palmer's (1996) model, the study involves 1,037 fourth-year students, including 155 English majors and 882 non-English majors. Both groups consider CMRU-TEP "useful", addressing six qualities of test usefulness. The perceptions of majors and non-majors are similar. To enhance CMRU-TEP, the following recommendations are proposed: 1) Develop a test administration handbook. 2) Integrate a writing and speaking portfolio as part of the proficiency assessment. 3) Ensure the test aligns with the focus on communicative skills. 4) Shorten test time and tasks. 5) Design tasks to stimulate real-world language use. 6) Explore the potential of computer-based testing as an alternative. The paper concludes by proposing tailored support for English and non-English majors based on identified needs.
2025, Grantee Submission
This study describes the development and initial psychometric evaluation of a Recognizing Effective Special Education Teachers (RESET) teacher observation instrument. Specifically, the study uses generalizability theory to compare two... more
This study describes the development and initial psychometric evaluation of a Recognizing Effective Special Education Teachers (RESET) teacher observation instrument. Specifically, the study uses generalizability theory to compare two versions of a rubric, one with general descriptors of performance levels and one with item-specific descriptors of performance levels, to evaluate special education teacher implementation of explicit instruction. Eight raters participated in viewing and scoring videos of special education instruction. Data collected from raters were analyzed in a three facet, crossed, mixed-model design to estimate the variance components and reliability indices. Results show lower unwanted sources of variance and higher indices of reliability with the rubric with item-specific descriptors of performance levels. Contributions to the field of teacher evaluation are discussed.
2025, Grantee Submission
This study describes the development and initial psychometric evaluation of a Recognizing Effective Special Education Teachers (RESET) teacher observation instrument. Specifically, the study uses generalizability theory to compare two... more
This study describes the development and initial psychometric evaluation of a Recognizing Effective Special Education Teachers (RESET) teacher observation instrument. Specifically, the study uses generalizability theory to compare two versions of a rubric, one with general descriptors of performance levels and one with item-specific descriptors of performance levels, to evaluate special education teacher implementation of explicit instruction. Eight raters participated in viewing and scoring videos of special education instruction. Data collected from raters were analyzed in a three facet, crossed, mixed-model design to estimate the variance components and reliability indices. Results show lower unwanted sources of variance and higher indices of reliability with the rubric with item-specific descriptors of performance levels. Contributions to the field of teacher evaluation are discussed.
2025, Psychology Research on Education and Social Sciences,
Evaluating teachers' phonological awareness strategies for enhancing communication skills in senior high school students.
2025, System
The recent surge in the morphological awareness literature targeting ESL university students has two limitations: most studies come from Asia, and measures used lack validation within ESL university contexts. The present study expands ESL... more
The recent surge in the morphological awareness literature targeting ESL university students has two limitations: most studies come from Asia, and measures used lack validation within ESL university contexts. The present study expands ESL research towards Africa. We investigated whether linguistic measures of morphological knowledge, normed on monolingual American students, can produce reliable and valid scores when used among English second-language (ESL) students in Ghana. 454 ESL university students completed the Nonword Sentence Completion task (NWSC) and the Derivational Suffix Task (DST). Results show that the NWSC task, although relatively easy, proved a reliable and valid test for Ghanian ESL students. The DST was reliable, but not sufficiently valid. Additionally, morphosyntactic features of the test items were analysed in relation to the results. Our expectation that items with simple morphosyntactic characteristics are easier than those with complex morphosyntactic characteristics was not confirmed, suggesting that the effect of morphosyntactic variables on linguistic complexity in derived words (lexical density, word frequency, sentence length, levels of shift) may be due to the measurement tasks used in this study. The study proposes a framework as to how existing language tests (in any language), developed for L1 populations, can be normed in L2 regions.
2025, Critical Inquiry in Language Studies
This editorial explores the intersection of equity and validity in language assessment, focusing on language-minoritized testtakers. Recognizing that tests often perpetuate inequitable power dynamics, our special issue, 'Equity... more
This editorial explores the intersection of equity and validity in language assessment, focusing on language-minoritized testtakers. Recognizing that tests often perpetuate inequitable power dynamics, our special issue, 'Equity Orientations for Validity Frameworks: Exploring the Intersections of Language and Assessment for Language-Minoritized Learners,' gathers research conducted in China, Iran, and the United States. The three empirical papers examine the socio-historical contexts of minoritization, challenging traditional validity frameworks in language assessment by centering on the diverse experiences of minoritized communities. The editorial argues for an equityfocused approach, emphasizing the importance of bi/multilingualism and intersectional perspectives in validity research. We propose a transdisciplinary 'toolbox' for researchers, advocating for methodologies that capture the complex realities of testtakers' lives. This collection aims to inspire future studies to challenge and expand traditional validity frameworks, promoting social justice and culturally responsive assessment practices that empower all learners.
2025, International Journal of Social Sciences
ABSTRACT This research study examines the oral expressions of students from Tamil medium schools before and after a targeted educational intervention using the Digital Language Laboratory (DLL). The focus was on enhancing key areas of... more
ABSTRACT
This research study examines the oral expressions of students from Tamil medium schools before and after a targeted educational
intervention using the Digital Language Laboratory (DLL). The focus was on enhancing key areas of oral skills such as pronunciation
and accent, fluency, vocabulary, syntax and grammar, and the confidence level of students. This study involved 64 students (45
girls and 19 boys) from Anbil Dharmalingam Agricultural College and Research Institute (ADAC&RI), and Horticultural College
and Research Institute for Women (HC&RI W), in Tiruchirappalli, India. After a diagnostic test, the selected students received
systematic training in DLL, established under the Indian Council for Agricultural Research – National Agricultural Higher
Education Plan – Institutional Development Plan (ICAR – NAHEP- IDP) at ADAC&RI in Tiruchirappalli. The oral performance of
the students was categorised into four levels: Needs Improvement, Fair, Good, and Excellent. The calculated t-value of oral skills
is very large, indicating a substantial difference between the pre-evaluation and post-evaluation scores. The p-value is extremely
small (<0.001), far less than the typical significance level of 0.05. This indicates that the observed difference in means is highly
statistically significant.
2024, Perspectives of Science & Education
Aim. The objectives of this research were to characterize and contrast the features of English language proficiency tests conducted before and during the COVID-19 pandemic. Methodology and research methods. Before coronavirus pandemic,... more
Aim. The objectives of this research were to characterize and contrast the features of English language
proficiency tests conducted before and during the COVID-19 pandemic.
Methodology and research methods. Before coronavirus pandemic, there were 287 students; during
pandemic, there were 288 pupils; there were also an English teacher and a forum for English teachers.
Through documentation and interviews, the information was gathered from eighth-graders at SMP Negeri
2 Semarang in Central Java, Indonesia.
Results. Some aspects of English accomplishment tests made before COVID-19 can be seen. First, the
percentages of items in the Easy, Moderate, and Difficult categories are 55%, 37.5%, and 7.5%, respectively.
The item discrimination percentages for the Poor, Fair, Good, and Very Good categories are 10%, 30%, 25%,
and 35%, respectively. Third, the distractor's effectiveness as a percentage is 53.30% and 46.70%. (effective:
ineffective). Finally, the test reliability value is 0.990. The English proficiency test created during COVID-19
exhibits some of the same traits. First, the percentages for Easy, Moderate, and Difficulty categories for
item difficulty: 10%, 84%, and 6%. The item discrimination percentages for the Poor, Fair, Good, and Very
Good categories are 2%, 4%, 14%, and 80%, respectively. Third, the distractor's percentage efficacy is
99.30%: 0.70% (effective: ineffective). Finally, the test reliability value is 0.960. The foundation of classical
test theory (CTT) was the effectiveness of the distractor, item difficulty, and item discrimination. The exams
administered during coronavirus pandemic were more normally distributed than the tests administered
prior to pandemic based on item difficulty. The tests given during coronavirus pandemic fell more into the
very good category than the tests given before pandemic, according to item discrimination. In comparison
to tests conducted before to coronavirus pandemic, more tests during pandemic were classified as effective
based on the distractor's effectiveness. Both tests were compared based on the data of the collected
features. The English achievement exam created during the epidemic was determined to be superior to the
test created prior to the outbreak based on CTT. However, the English performance exam created before
the epidemic is superior than that created during the pandemic, according to Item Response Theory (IRT).
IRT was based on item fit and dependability. Testing for dependability before COVID-19 is more accurate
than during pandemic. Before COVID-19, item fit tests were more favorable than during pandemic.
Conclusions. The English proficiency test that was created during the epidemic is superior to the test
that was created prior to the pandemic based on CTT. But according to IRT, the English proficiency exam
created before the pandemic is superior to that created during the pandemic.
2024
The influence that a test has on teaching and learning is commonly known as washback. This study seeks to explore the washback effect of Thailand National Sports University English Proficiency Test (TEPT), a highstake compulsory... more
The influence that a test has on teaching and learning is commonly known as washback. This study seeks to explore the washback effect of Thailand National Sports University English Proficiency Test (TEPT), a highstake compulsory university graduation test. The focusing area was on students as the main participants in this research, the issue of how university students of Thailand National Sports University (TNSU) perceived on the TEPT and their own self efficacy of the intended results in relation to the test was examined. The findings suggested that the students' perceptions played a major role in mediating the washback effect of the TEPT, in which the test shaped their goals and consequently stimulated their use of language learning strategies when preparing for the test. However, with regard to the concern on the graduation requirements and passing the test, though it was the initial issue to implement this policy was ignored due to the failure to enforce the rule to use the language assessment (TEPT) as a gatekeeping device for a graduation. In sum, it may be concluded that when a policymaker introduces a novel policy by using English proficiency test as a graduation requirement, the implementation should be clear, hence the practitioners would be able to take actions that align with the policy. In particular, to avoid negative unintended consequences, the policy should not be ambiguous as it may result in unexpected policy ineffectiveness, or neglect.
2024, Language Testing in Asia
Although language test-takers have been the focus of much theoretical and empirical work in recent years, this work has been mainly concerned with their attitudes to test preparation and test-taking strategies, giving insufficient... more
Although language test-takers have been the focus of much theoretical and empirical work in recent years, this work has been mainly concerned with their attitudes to test preparation and test-taking strategies, giving insufficient attention to their views on broader socio-political and ethical issues. This article examines test-takers’ perceptions and evaluations of the fairness, justice and validity of global tests of English, with a particular focus upon the International English Language Testing System (IELTS). Based on relevant literature and theorizing into such tests, and on self-reported test experience data gathered from test-takers (N = 430) from 49 countries, we demonstrate how test-takers experienced fairness and justice in complex ways that problematized the purported technical excellence and validity of IELTS. Even as there was some evidence of support for the test as a fair measure of students’ English capacity, the extent to which it actually reflected their language ...
2024, Balirano, Giuseppe / Rasulo, Margaret 2024. Re-Constructing the Mentality of (Language) Learning: Post-Pandemic Challenges. In Balirano, Giuseppe / Rasulo, Margaret (eds), Advances, Trends and Approaches in Language Teaching, Learning and Education in the Post-pandemic Era: Theory and Practice. R...
The precautionary educational measures undertaken to fend off rampant contagion caused by COVID-19 pandemic resulted in the closing down of in-person instruction and migration towards online teaching and learning. What followed was a rush... more
The precautionary educational measures undertaken to fend off rampant contagion caused by COVID-19 pandemic resulted in the closing down of in-person instruction and migration towards online teaching and learning. What followed was a rush towards the implementation of a ‘pandemic pedagogy’ in which digitally-driven technologies were positioned as a frontline emergency service. With the pandemic now nearly behind us, it seems that finding quick-fixing solutions has also adversely impacted the effectiveness of digitally meditated education, creating a series of caveats related to the imposed use of technology and media. The paper therefore argues that the urgency of finding solutions that could stand the test of the pandemic, on one hand has eluded sound pedagogical principles of online education, while on the other has given way to an unprecedented educational experiment consisting in the re-fostering, re-generating and re-motivating the mentality of learning.
2024, Balirano, Giuseppe / Rasulo, Margaret 2024. Advances, Trends and Approaches in Language Teaching, Learning and Education in the Post-pandemic Era: Theory and Practice – An Introduction. In Balirano, Giuseppe / Rasulo, Margaret (eds), Advances, Trends and Approaches in Language Teaching, Learning ...
Inspired by the insightful contributions delivered during the First International Conference on New Trends and Emerging Approaches in English held in Procida in September 2022, this Special Issue of RILA collects original scholarly work... more
Inspired by the insightful contributions delivered during the First International Conference on New Trends and Emerging Approaches in English held in Procida in September 2022, this Special Issue of RILA collects original scholarly work which discusses some cogent themes regarding societal transformations and their consequential repercussions on education systems around the world. By highlighting the socially and historically constructed relationship between learners and the target language in shifting and developing contexts (Blommaert 2013, Balirano 2021), this Special Issue specifically addresses the complexities of the current linguistic landscape in which advances in ELT have replaced traditional notions of education, and introduced innovative ways of thinking and learning as individuals traverse local and global boundaries, occupying multiple online and offline spaces, while coping with the aftermath of post-pandemic times.
2024
Abstract Classroom performance assessment has gained prominence parallel to the multiplicity of the purposes ahead of the assessment. Of many, the major controversy, which was the motive behind this study, is the incorporation of... more
Abstract
Classroom performance assessment has gained prominence parallel to the multiplicity of the
purposes ahead of the assessment. Of many, the major controversy, which was the motive
behind this study, is the incorporation of L1-based elicitation as a valid measure of L2
performance assessment. To shed empirical light on this issue, this explanatory sequential
mixed-methods research employed 87 Iranian intermediate EFL learners, whose L2 classroom
performance was assessed through L1-based elicitation techniques. In order to validate this
mechanism, the multi-method mono-trait model (namely, Pearson correlation, structural
equations, exploratory and confirmatory factor analysis, composite reliability, and convergent
validity) suggested by Henning and Mesick’s Unitary Concept of validity were applied. The
results from these multiple sources of evidence yield support to their common consensus that
L1-based elicitation techniques are valid measures of L2 performance assessment. The findings
then offer a legacy to the educational implications of L1-based mechanisms both in L2
instruction and assessment.
Keywords: L1-based elicitation, Performance assessment, Speaking ability, Unitary concept of
validity
2024
Classroom performance assessment has gained prominence parallel to the multiplicity of the purposes ahead of the assessment. Of many, the major controversy, which was the motive behind this study, is the incorporation of L1-based... more
Classroom performance assessment has gained prominence parallel to the multiplicity of the purposes ahead of the assessment. Of many, the major controversy, which was the motive behind this study, is the incorporation of L1-based elicitation as a valid measure of L2 performance assessment. To shed empirical light on this issue, this explanatory sequential mixed-methods research employed 87 Iranian intermediate EFL learners, whose L2 classroom performance was assessed through L1-based elicitation techniques. In order to validate this mechanism, the multi-method mono-trait model (namely, Pearson correlation, structural equations, exploratory and confirmatory factor analysis, composite reliability, and convergent validity) suggested by Henning and Mesick's Unitary Concept of validity were applied. The results from these multiple sources of evidence yield support to their common consensus that L1-based elicitation techniques are valid measures of L2 performance assessment. The findings then offer a legacy to the educational implications of L1-based mechanisms both in L2 instruction and assessment.
2024, Global Journal of Foreign Language Teaching
Many educational bureaucracies use national examinations for high-stake decision-making including certification, promotion, and qualifications. This study examines the washback effect of one of those exams, known as the General Aptitude... more
Many educational bureaucracies use national examinations for high-stake decision-making including certification, promotion, and qualifications. This study examines the washback effect of one of those exams, known as the General Aptitude Test (GAT), on the students’ learning of the Arabic language. Specifically, it examines students’ perceptions of the GAT and how it impacted their learning practice. Based on questionnaire responses from 548 high school students, and 12 interviews, the study finds a negative washback effect on students’ perceptions and learning. Most students expressed negative views about the GAT as it causes stress, and they perceive that it is a barrier to their university admission. In terms of using the GAT results, findings reveal that many students did not change their learning due to the mismatch between test content, learning, and teaching activities. This study provides important evidence about students, perceptions of and influence of GAT on their learning. We discuss the implications of these findings in minimizing the negative washback effect.
2024, Shanlax International Journal of Education
This paper explores the reliability of using ChatGPT in evaluating EFL writing by assessing its intra-and inter-rater reliability. Eighty-two compositions were randomly sampled from the Written English Corpus of Chinese Learners. These... more
This paper explores the reliability of using ChatGPT in evaluating EFL writing by assessing its intra-and inter-rater reliability. Eighty-two compositions were randomly sampled from the Written English Corpus of Chinese Learners. These compositions were rated by three experienced raters with regard to 'language', 'content', and 'organization'. The writing samples were also rated by ChatGPT twice over some time, and the average scores were calculated. Independent samples t-test was conducted to compare the average scores given by ChatGPT and human raters. Pearson correlation analyses were conducted between the two sets of overall scores given by ChatGPT to calculate the intra-rater reliability, as well as between average scores given by ChatGPT and human raters for inter-rater reliability. The results of comparative analysis shows that ChatGPT may be used for evaluating EFL essays, as the scores are similar to those provided by reliable human raters. However, the result of correlation analyses shows that the intra-rater reliability of ChatGPT is not high enough to be acceptable, r=0.575, p<0.01 and the strength of the inter-rater reliability is moderate as well, r=0.508, p<0.01. Besides, there is no significant relationship between their average scores on 'organization' of the writings, r=0.181, p>0.05. Thus, it can be concluded that ChatGPT is not a reliable tool to rate and score EFL writings using the prompt in this study. One of the possible reasons for the unreliability of ChatGPT as a rater of EFL writing seems to be related to scoring for the 'organization' of the essay. These findings imply that while ChatGPT has potential as an evaluative tool, its current limitations, particularly in assessing organization, must be addressed before it can be reliably used in educational settings.
2024, International Journal of Language Testing
This study aimed at evaluating a PISA-like reading test developed by teachers participating in the teacher training for teaching PISA-like reading. To serve this purpose, an experimental test was administered to 107 students aged 15-16... more
This study aimed at evaluating a PISA-like reading test developed by teachers participating in the teacher training for teaching PISA-like reading. To serve this purpose, an experimental test was administered to 107 students aged 15-16 using a set of text and questions constructed according to the criteria of the PISA Reading test Level 1. Item analysis was performed following the sampling using Rasch Measurement, deemed essential for determining the ideal index of test items relative to students' ability in making the correct response. The component of the calculation comprises reliability, separation, dan standard error. The Rasch model was constructed manually using Microsoft Excel to obtain the result of the calculation, and a Wright Map was also made manually to illustrate the result of the calculation. The results of the item analysis indicate that the test and the items the teachers constructed have met good criteria. The results revealed an even distribution of test item difficulty at the targeted level. The samples' ability to make correct answers, however, was decentralized towards the test items of moderate level of difficulty. Only a limited number of students showed good ability in their response to test items of higher difficulty. These findings have the practical implication of advancing PISAlike reading test teaching and writing models by providing more information on teachers' ability to write PISA-like reading test items and the levels of difficulty of the items written by the teachers, as indicated by the students' responses to the test items.
2024, Mapping Teacher-Produced Tests to a Usefulness Model
This article may be used for research, teaching, and private study purposes. According to open access policy of our journal, all readers are permitted to read, download, copy, distribute, print, link and search our article with no charge.
2024
The relationship between IELTS and other recognised measures of English language proficiency41 7 Conclusion.
2024
The rapid growth of IELTS has resulted in a growing number of people providing information about the IELTS Test, setting standards, interpreting scores and advising test-takers. This project examined the assessment literacy needs of... more
The rapid growth of IELTS has resulted in a growing number of people providing information about the IELTS Test, setting standards, interpreting scores and advising test-takers. This project examined the assessment literacy needs of university test users (including admissions, marketing, academic and English language staff), how well these needs are being met and what other approaches could be adopted to meet these needs. The study took the form of a “proactive evaluation” (Owen, 2006), which included: ! online survey and face-to-face interviews to investigate the assessment literacy needs of IELTS Test users at two Australian universities and how well these needs are met by current resources ! discourse analytic study of the IELTS Guide (2009) ! comparative evaluations of different IELTS Test resources and the institutional sections of the IELTS, TOEFL and PTE Academic websites ! a review of best practice in staff online training programs.
2024, IELTS Research Reports
IELTS Research Reports Volume 10 95 3. Investigating IELTS exit score gains in higher education Authors Dr Kieran O'Loughlin Dr Sophie Arkoudis Melbourne Graduate School of Education The University of Melbourne Grant awarded Round... more
IELTS Research Reports Volume 10 95 3. Investigating IELTS exit score gains in higher education Authors Dr Kieran O'Loughlin Dr Sophie Arkoudis Melbourne Graduate School of Education The University of Melbourne Grant awarded Round 11, 2005 This study investigates ...
2024
Testing practices and the construct of English both serve separately and interactionally to promote activities of modernity and coloniality. Testing instruments and interpretations of test performance-by design-categorize and rank... more
Testing practices and the construct of English both serve separately and interactionally to promote activities of modernity and coloniality. Testing instruments and interpretations of test performance-by design-categorize and rank learning and knowledge in discrete, static ways. Tests are limited in capturing a specified type of response that counts as evidence of learning, whether it is a familiar multiple-choice it-choice item or an essay that is scored (often computer-scored)
2024, ITSC
Goals This research is commissioned by ITSC, Korea, to review the General Speaking Test (GST) with two express goals in mind. The first is to make the GST attractive enough for potential test takers, and the second is to devise ways to... more
Goals
This research is commissioned by ITSC, Korea, to review the General Speaking Test (GST) with two express goals in mind. The first is to make the GST attractive enough for potential test takers, and the second is to devise ways to increase their final grades without compromising the test's quality or changing its 11-part structure.
Conceptual frameworks
This research is based on a literature review, two GST test samples, and a GST grade sample for 178 test takers. The research goals were examined through the User Experience (UX) Design framework, which advocates designing quality tests and putting the human test taker at the center by creating a pleasant and supportive test-taking experience and ensuring a satisfactory performance outcome. To manage test taker expectations, test designers are urged to follow the "bias for the best performance" principle. The more the cognitive load is alleviated, the better able test takers will be to attend to the task at hand. Related to UX design is the concept of localizing the test contents based on English language norms informed by nonnative English speaker models. These models do not impose a strict native speaker standard in assessing language skills. They call for reconceptualizing the notion of intelligibility from a localization standpoint.
Outcomes
This research has yielded several recommendations designed to improve the GST test taker's experience and promote the GST as a viable test among its competitors. Hopefully, these recommendations will improve performance outcomes, scores, and affect. Three types of recommendations are made: (i) Programmatic recommendations relate to action items ITSC can engage in to promote the test, make it user-friendly and non-discriminatory against all potential test takers, and conduct test-taker satisfaction surveys. (ii) Design recommendations offer actionable suggestions for test designers to localize the test contents and decrease the cognitive load on test takers' working memory is more focused on the task. (iii) Grading recommendations are addressed to test graders, so they adopt a "bias for the best performance" attitude and implement the recommendations specified in (i) and (ii) above.
Significance
This research is significant on two counts: (i) It offers concrete and actionable recommendations to improve test scores and increase the GST's capacity to emerge as an attractive test among the competition. (ii) It grounds the endeavor of upgrading and improving the GST in current thinking that privileges user experience design, promotes content localization, and capitalizes on the test taker's linguistic, cultural, and ideational capital.
2024, Lecture Notes in Computer Science
We report an experiment to evaluate DQGen's performance in generating three types of distractors for diagnostic multiple-choice cloze (fill-in-theblank) questions to assess children's reading comprehension processes. Ungrammatical... more
We report an experiment to evaluate DQGen's performance in generating three types of distractors for diagnostic multiple-choice cloze (fill-in-theblank) questions to assess children's reading comprehension processes. Ungrammatical distractors test syntax, nonsensical distractors test semantics, and locally plausible distractors test inter-sentential processing. 27 knowledgeable humans rated candidate answers as correct, plausible, nonsensical, or ungrammatical without knowing their intended type or whether they were generated by DQGen, written by other humans, or correct. Surprisingly, DQGen did significantly better than humans at generating ungrammatical distractors and slightly better than them at generating nonsensical distractors, albeit worse at generating plausible distractors. Vetting its output and writing distractors only when necessary would take half as long as writing them all, and improve their quality.
2024, Lecture Notes in Computer Science
We report an experiment to evaluate DQGen's performance in generating three types of distractors for diagnostic multiple-choice cloze (fill-in-theblank) questions to assess children's reading comprehension processes. Ungrammatical... more
We report an experiment to evaluate DQGen's performance in generating three types of distractors for diagnostic multiple-choice cloze (fill-in-theblank) questions to assess children's reading comprehension processes. Ungrammatical distractors test syntax, nonsensical distractors test semantics, and locally plausible distractors test inter-sentential processing. 27 knowledgeable humans rated candidate answers as correct, plausible, nonsensical, or ungrammatical without knowing their intended type or whether they were generated by DQGen, written by other humans, or correct. Surprisingly, DQGen did significantly better than humans at generating ungrammatical distractors and slightly better than them at generating nonsensical distractors, albeit worse at generating plausible distractors. Vetting its output and writing distractors only when necessary would take half as long as writing them all, and improve their quality.
2024, Premise
This current study investigates the implementation of flashcards and the Teams Games Tournament (TGT) to enhance students' vocabulary mastery. The participants were SDIT Nurul Hidayah Cigedog Brebes, fourth graders in the academic year... more
This current study investigates the implementation of flashcards and the Teams Games Tournament (TGT) to enhance students' vocabulary mastery. The participants were SDIT Nurul Hidayah Cigedog Brebes, fourth graders in the academic year 2021-2022. In addition, thirty-three students were the subjects of the study. This study employed Classroom Action Research. Instruments included a pre-test, a post-test, a questionnaire, and an observation. Besides using the normality test, descriptive statistics and paired sample T-test were utilized to analyze the data. In cycle I, the pre-test average was 72.12, and the post-test average was 82.03. In cycle II, the pre-test and post-test standards increased from 85.67 to 93.061. The cycle I mean was more significant than the cycle II mean. The "Paired Samples T-test" indicated that the cycle II result is higher than the cycle I result. Besides, the responses of students were positively received. The results of the observation sheet indicated that the students conducted teaching and learning activities effectively. Therefore, the implementation of flashcards and TGT is successful in enhancing students' vocabulary mastery.
2024
Research related to Australian Indigenous people is of national significance and is full of challenges as well as opportunities for both Indigenous and non-Indigenous researchers. It requires cultural sensitivity, innovation and pragmatic... more
Research related to Australian Indigenous people is of national significance and is full of challenges as well as opportunities for both Indigenous and non-Indigenous researchers. It requires cultural sensitivity, innovation and pragmatic approaches to frame the enquiry to reach its intended outcomes. Indigenous students are the most marginalised equity group at Australian universities and some aspects of their access and success at university, especially those related to the use of English as the sole medium of instruction, are yet to be explored thoroughly.A grounded approach to the research was guided by the advice of Indigenous mentors of the researcher. The host university’s strong commitment to the compliance of ethical practices for Indigenous research combined with the collective experiences of the mentors and the researcher in Indigenous education informed and guided the drawing up of a pragmatic and culturally sensitive research framework. This paper outlines the developme...
2024, Language Teaching Research Quarterly
Language testing and assessment are diverse areas which continue to experience rapid development as fields of study. The same growth is also true when we consider the range of methods of testing and assessing student abilities that we... more
Language testing and assessment are diverse areas which continue to experience rapid development as fields of study. The same growth is also true when we consider the range of methods of testing and assessing student abilities that we have at our disposal. Indeed, as a glance at ‘flagship’ journals such as Language Testing (2022) and Language Assessment Quarterly (2022) will confirm, the scope of academic scholarship is now enormous, as are the many ways in which testers and assessors create, administer, score, and evaluate the tests and assessments they have at their disposal. However, despite this rapid and continuous development, at the heart of these areas is the need for appropriate, fair, valid, reliable, and useful testing and assessment practices and instruments (see recent considerations of these issues in Fulcher & Harding, 2022). This Special Issue in honour of Professor Glenn Fulcher is a tribute to his work at the heart of these areas. The Special Issue describes this w...
2024, Language Testing in Asia
An amendment to this paper has been published and can be accessed via the original article.
2024, International Journal of Instruction
This study was primarily aimed at developing an English-speaking proficiency test and analytic rubrics designed to measure speaking proficiency of Malaysian undergraduates. On the basis of Littlewood's Methodological Framework and Long's... more
This study was primarily aimed at developing an English-speaking proficiency test and analytic rubrics designed to measure speaking proficiency of Malaysian undergraduates. On the basis of Littlewood's Methodological Framework and Long's Interaction Hypothesis, the researchers derived three speaking tasks from four sources: (a) syllabus of the English language courses at the relevant university, (b) Kathleen Bardovi-Harlig's operationalizing conversation speech acts, (c) IELTS part B speaking test, and (d) task B speaking section of Malaysian University English Test (MUET). A total of 96 undergraduates with four levels of the language proficiency (i.e., low performers, intermediate performers, upperintermediate performers, and high performers) from a public university in Malaysia voluntarily participated in the study. While two TESOL experts were invited to validate the content of the tasks and the rubrics, two raters rated students' test scores. Construct validity was established through a known-group validity (construct validity) for a known-group comparison of the task performance at the three difficulty levels namely, elementary, intermediate and advanced. The test scores, having good internal consistency (a= .89) and inter-rater reliability (ICC= .84), yielded speaking proficiency descriptors. This result showed that the test is reliable and valid to diagnose speaking proficiency of Malaysian undergraduates in pursuit of improvement.
2024
Given the rapid development of Information and Communication Technologies and the increasing number of students, many higher education institutions are moving towards the use of the Internet for the delivery of their courses both on... more
Given the rapid development of Information and Communication Technologies and the increasing number of students, many higher education institutions are moving towards the use of the Internet for the delivery of their courses both on campus and at a distance. This has prompted educators, testing experts and test developers to look at ways of applying Information Technology to the assessment of students’ learning. In this respect, our objective in this paper is to present a web-based tool for the assessment of English learning. This internet-based assessment system, which is based on multiple choice questions, gives students an opportunity to assess their English grammar skills. In fact, once logged in, each learner is given a random selection of questions covering English grammar. At the end of the quiz, the student gets his score as well as the list of questions he has been asked together with feedback information as to whether his answers are true or false. More importantly, the le...
2024
本研究は、英語教員対象のオンライン講座「英文ライティング添削講座」で得られた、さまざまなデータを 分析したものである。受講者の英作文の評価には IELTS (International English Testing System)のルー ブリックを使用し、トータルスコアと各評価項目(課題への回答、一貫性とまとまり、語彙力、文法知識と 正確さ)のスコアについて、分散と相関を調べた。その結果、受講者にとって慣れ親しんだライティング... more
本研究は、英語教員対象のオンライン講座「英文ライティング添削講座」で得られた、さまざまなデータを 分析したものである。受講者の英作文の評価には IELTS (International English Testing System)のルー ブリックを使用し、トータルスコアと各評価項目(課題への回答、一貫性とまとまり、語彙力、文法知識と 正確さ)のスコアについて、分散と相関を調べた。その結果、受講者にとって慣れ親しんだライティング タスクより複雑なタスクの方が、ライティングスキルに顕著な差が見られることがわかった。相関分析の結 果からは、タスクが複雑になるほどライティングのさまざまな側面に受講者の注意が向くことが推察され た。これらのことから、英語教員のためのライティング指導では、段階的にタスクの難易度を上げながら、 さまざまなトピックのタスクに取り組ませる必要がある、ということが示唆された。
2024, East African Scholars Journal of Education, Humanities and Literature
Evaluating EFL students’ oral production is one among the most challenging teaching practices. On the one hand, there is a controversy over what criteria are to be considered the most in the evaluation process. On the other hand, the... more
Evaluating EFL students’ oral production is one among the most challenging teaching practices. On the one hand, there is a controversy over what criteria are to be considered the most in the evaluation process. On the other hand, the perceptions students have of their teachers’ feedback tend to influence their learning behaviour. The aim of the present paper is to shed light on what relevant methods can help attain objectivity when evaluating EFL learners’ speaking. It also tries to know about the sample’s (2nd year students of English – university of Saida) stances about and reactions to testing and evaluating their speaking abilities. Two questionnaires were used and addressed to both students and teachers. The results confirm that the nature and the conditions of the evaluation process make it very hard to attain objectivity. In addition, students’ affective aspects in relation to evaluation have a relatively negative impact on their learning orientations of the speaking skills.
2024, Asia Pacific Journal of Education
This study explored the language assessment literacy (LAL) level and language testing and assessment (LTA) needs of 57 English teachers at senior high school level in Taiwan. An LAL test, three quantitative questionnaires and a... more
This study explored the language assessment literacy (LAL) level and language testing and assessment (LTA) needs of 57 English teachers at senior high school level in Taiwan. An LAL test, three quantitative questionnaires and a qualitative survey were administered, conveying the overall LAL level and LTA needs of the participants. The qualitative survey then elicited their perceptions of and perspectives on classroom-based assessments and national high-stakes tests with regard to a newly implemented national curriculum (i.e., 108-Curriculum). The results show that the teachers lack adequate LAL to varying degrees, depending on their demographic background. Additionally, the participants identified some assessment topics, from large standardized testing to applying test results. They also reported their greatest training needs: providing feedback, finding teaching content, and assessing integrated skills. The teachers’ qualitative accounts generated five themes, from education in testing and assessment as a useful tool to the gap between classroom-based assessments and the national tests. The results not only refine assessment modules in the teacher education programme but suggest a direction for the assessment training of English teachers. The study concludes by suggesting future LAL investigations in the English-teaching context of Taiwan or wider international communities.
2024, Academy of Education and Social Sciences Review
Writing is challenging in second language and its test construction is rather problematic. The present study is aimed to offer an overview of the fundamental considerations in the test construction in English language teaching. The... more
Writing is challenging in second language and its test construction is rather problematic. The present study is aimed to offer an overview of the fundamental considerations in the test construction in English language teaching. The significance of test validity, test reliability, task interactiveness, test attentiveness, test impact, and test practicality have been employed to improve educational practices. The researcher has considered the validity and reliability as severe concerns for the issues in question and the review has been advanced to add to the understanding of the former’s value in designing a writing test. Desktop reach formed the base of this research. The search confirms that the aforementioned characteristics are central to the design. This confirms with the research findings available in the literature of teaching and testing.
2024
The purpose of this study was to investigate preparatory class instructors' attitudes towards the methods of assessment they are currently using at their institutions, and their knowledge about and attitudes towards portfolios as an... more
The purpose of this study was to investigate preparatory class instructors' attitudes towards the methods of assessment they are currently using at their institutions, and their knowledge about and attitudes towards portfolios as an alternative method of assessment. The study was conducted with 386 English instructors from the preparatory class programs of 14 Turkish state universities. Data were collected through a fourpart questionnaire including closed-response and Likert-Scale questions. Part A in the questionnaire gathered data about the instructors' educational background and ix TABLE OF CONTENTS
2024, PASAA Journal
Investigating Proficiency of Academic English in Student Writing: A Comparative Case Study on Vocabulary Utilization in Student Research Article Writing vis-à-vis National and International Research
2023, Colombian Applied Linguistics Journal
Accountability in language education is often associated with top-down national policies unresponsive-or even hostile to-local needs; however, when accountability is driven by local stakeholders seeking to better understand and enhance... more
Accountability in language education is often associated with top-down national policies unresponsive-or even hostile to-local needs; however, when accountability is driven by local stakeholders seeking to better understand and enhance their programs, it can foster productive cycles of action research and curriculum development. This paper reports on one such internally-motivated accountability effort, in which program insiders sought to determine the efficacy of a reading test being administered to a new population of students at one Colombian university. Descriptive statistics, reliability estimates, and item facility and discrimination measures were used to determine whether this test was sufficiently reliable and appropriately matched to test takers' ability in order to warrant its use as part of a high-stakes English-language placement exam. A detailed analysis of this test is used not only to propose specific recommendations for revision but also to illustrate a useful set of statistical tools appropriate for test analysis in other language programs. Moreover, we conclude that the involvement of local instructors as part of an iterative, self-reflective, test development process provides opportunities for professional development and deeper engagement in accountability projects.
2023
NOTICE: The author has granted a nonexclusive license allowing Library and Archives Canada to reproduce, publish, archive, preserve, conserve, communicate to the public by telecommunication or on the Internet, loan, distribute and sell... more
NOTICE: The author has granted a nonexclusive license allowing Library and Archives Canada to reproduce, publish, archive, preserve, conserve, communicate to the public by telecommunication or on the Internet, loan, distribute and sell theses worldwide, for commercial or noncommercial purposes, in microform, paper, electronic and/or any other formats. AVIS: L'auteur a accorde une licence non exclusive permettant a la Bibliotheque et Archives Canada de reproduire, publier, archiver, sauvegarder, conserver, transmettre au public par telecommunication ou par I'lnternet, prefer, distribuer et vendre des theses partout dans le monde, a des fins commerciales ou autres, sur support microforme, papier, electronique et/ou autres formats. The author retains copyright ownership and moral rights in this thesis. Neither the thesis nor substantial extracts from it may be printed or otherwise reproduced without the author's permission. L'auteur conserve la propriete du droit d'auteur et des droits moraux qui protege cette these. Ni la these ni des extraits substantiels de celle-ci ne doivent etre imprimes ou autrement reproduits sans son autorisation. In compliance with the Canadian Privacy Act some supporting forms may have been removed from this thesis. While these forms may be included in the document page count, their removal does not represent any loss of content from the thesis. •*• Canada Conformement a la loi canadienne sur la protection de la vie privee, quelques formulaires secondaires ont ete enleves de cette these. Bien que ces formulaires aient inclus dans la pagination, il n'y aura aucun contenu manquant.
2023, The Asia-Pacific Education Researcher
In 2008, the Ministry of Education of South Korea planned to develop a domestic, standardized language test called the National English Ability Test (NEAT) as part of sweeping reforms designed to democratize and improve the Korean... more
In 2008, the Ministry of Education of South Korea planned to develop a domestic, standardized language test called the National English Ability Test (NEAT) as part of sweeping reforms designed to democratize and improve the Korean education system. On many levels, NEAT was an innovative initiative. From a financial perspective, NEAT was designed to curb private spending on English education and to divert money from international standardized English tests. Educationally, NEAT was groundbreaking in that it introduced speaking and writing into English assessment and was meant to replace the English section on the College Scholastic Ability Test of Korea. In addition, NEAT was a two-track English test for high school students intended to meet the differing needs of students and colleges. After about five years, including a trialing period and a new presidential administration, plans for implementing NEAT were abandoned officially in 2013. The present paper examines the educational, sociopolitical, and economic factors that led to the rise and demise of NEAT from a critical language testing perspective. Following the tenets of critical language testing, this paper addresses the responsibility of the language testers to create an assessment that fairly measures test-takers’ language proficiency. A review and an inductive interpretation of policy documents and recent NEAT research demonstrate the discrepancies among the original intentions of NEAT, the unintended outcomes, and overlooked issues of practicality. Findings also reveal that the two tracks of NEAT negatively branded both students and universities, deepening instead of alleviating the English divide.
2023, Zenodo (CERN European Organization for Nuclear Research)
The study examines English as Foreign Language (EFL) students' attitude towards developing their speaking abilities at KRI University in order to better understand the disparities in speaking competency among undergraduates. The study... more
The study examines English as Foreign Language (EFL) students' attitude towards developing their speaking abilities at KRI University in order to better understand the disparities in speaking competency among undergraduates. The study utilized a quantitative approach and employed a 4-item interview survey to gather data for the study. The survey interview questionnaire was adopted from Wang, Kim, Bong, and Ahan (2013) and administered to 100 students in the departments of English of six universities in Iraq's Kurdistan Region. A semi-structured interview was developed for EFL students. The questionnaire was online and an open-ended one. The data from the participants was analyzed using thematic analysis with (SPSS) software. The finding revealed a perceived failure in EFL students' English-speaking skills, and this was reported along with causes of the perceived difficulty. The finding also revealed a poor level of speaking ability among EFL undergraduates as well as little education in the skill at the university level. Apart from these, the study discovered some major challenges for EFL students such as lack of confidence, lack of planning, a demotivating atmosphere, incorrect word choice, poor gestures, and incorrect style which made the students not to be successful in their speaking abilities. The study suggested that EFL learners' competency should be securitized to strengthen their speaking abilities in the light of the results of the study. Speaking is a crucial ability in language acquisition and EFL teachers should help their students acquire it. As a means of improving students' communicative ability, task-based instruction should be utilized in educational institutions and universities. The implication of this paper is that speaking difficulties among EFL students in the Iraqi Kurdistan Region institution can be solved by putting greater focus on this ability. There are several issues to consider, including teachers, instructional methodologies, the curriculum, extracurricular activities, and assessment rules.
2023, Iranian Journal of Applied Language Studies
The ability to assess the language learners' progress has been known as one of the most important parts of EFL/ESL teachers' literacy. Language assessment literacy (LAL). The notion of LAL has evolved over time, as a large number of... more
The ability to assess the language learners' progress has been known as one of the most important parts of EFL/ESL teachers' literacy. Language assessment literacy (LAL). The notion of LAL has evolved over time, as a large number of researchers showed to be enthusiastic to study this research area. However, the number of studies on teachers' Writing Assessment Literacy (WAL) is scanty. As writing skill is very necessary for language learners to communicate with native speakers of the English language, it is very important for writing teachers to develop assessment tasks to positively contribute to the rate of learners' progress in writing skill. Therefore, it is of much significance to review the related studies on assessment literacy, language assessment literacy, and writing assessment literacy. In this review study, the relevant studies were reviewed and further directions for writing assessment literacy of EFL/ESL teachers are suggested to the researchers interested in the field.
2023
VEO IELTS PROJECT REPORT: Which specific features of candidate talk do examiners orient to when taking scoring decisions? The research investigated which specific features of candidate talk IELTS Speaking Test (IST) examiners orient to... more
VEO IELTS PROJECT REPORT: Which specific features of candidate talk do examiners orient to when taking scoring decisions? The research investigated which specific features of candidate talk IELTS Speaking Test (IST) examiners orient to when taking scoring decisions. We also researched whether the use of the scoring scheme and customised app potentially adds any value to IST examiner development.