Vahid Aryadoust | Nanyang Technological University (original) (raw)
Books and Chapters by Vahid Aryadoust
Computer Assisted Language Learning, 2020
A fundamental requirement of language assessments which is underresearched in computerized assess... more A fundamental requirement of language assessments which is underresearched in computerized assessments is impartiality (fairness) or equal treatment of test takers regardless of background. The present study aimed to evaluate fairness in the Pearson Test of English (PTE) Academic Reading test, which is a computerized reading assessment, by investigating differential item functioning (DIF) across Indo-European (IE) and Non-Indo-European (NIE) language families. Previous research has shown that similarities between readers’ mother tongue and the second language being learned can advantage some test takers. To test this hypothesis, we analyzed data from 783 international test takers who took the PTE Academic test, using the partial credit model in Rasch measurement. We examined two main types of DIF: uniform DIF (UDIF), which occurs when an item consistently gives a particular group of test takers an advantage across all levels of ability, and non-uniform DIF (NUDIF), which occurs when the performance of test takers varies across the ability continuum. The results showed no statistically significant UDIF (p > 0.05), but identified 3 NUDIF items out of 10 items across the language families. A mother tongue advantage was not observed. Similarity in test takers’ level of computer and Internet skills, test preparation, and language policies could contribute to the finding of no UDIF. Post-hoc content analysis of items suggested that the decrease of mother tongue advantage for IE groups in high-proficiency groups and lucky guesses of low-ability groups may have contributed to the emergence of NUDIF items. Lastly, recommendations for investigating social and contextual factors are proposed.
Studies in Educational Evaluation, 2020
The computerization of reading assessments has presented a set of new challenges to test designer... more The computerization of reading assessments has presented a set of new challenges to test designers. From the vantage point of measurement invariance, test designers must investigate whether the traditionally recognized causes for violating invariance are still a concern in computer-mediated assessments. In addition, it is necessary to understand the technology-related causes of measurement invariance among test-taking populations. In this study, we used the available data (n = 800) from the previous administrations of the Pearson Test of English Academic (PTE Academic) reading, an international test of English comprising 10 test items, to investigate measurement invariance across gender and the Information and Communication Technology Development index (IDI). We conducted a multi-group confirmatory factor analysis (CFA) to assess invariance at four levels: configural, metric, scalar, and structural. Overall, we were able to confirm structural invariance for the PTE Academic, which is a necessary condition for conducting fair assessments. Implications for computer-based education and the assessment of reading are discussed.
https://www.sciencedirect.com/science/article/pii/S0191491X19301452
Routledge, 2019
Volume I of Quantitative Data Analysis for Language Assessment is a resource book that presents t... more Volume I of Quantitative Data Analysis for Language Assessment is a resource book that presents the most fundamental techniques of quantitative data analysis in the field of language assessment. Each chapter provides an accessible explanation of the selected technique, a review of language assessment studies that have used the technique, and finally, an example of an authentic study that uses the technique. Readers also get a taste of how to apply each technique through the help of supplementary online resources that include sample data sets and guided instructions. Language assessment students, test designers, and researchers should find this a unique reference as it consolidates theory and application of quantitative data analysis in language assessment.
The purpose of the present study was twofold: (a) it examined the relationship between peer-rated... more The purpose of the present study was twofold: (a) it examined the relationship between peer-rated likeability and peer-rated oral presentation skills of 96 student presenters enrolled in a science communication course, and (b) it investigated the relationship between student raters’ severity in rating presenters’ likeability and their severity in evaluating presenters’ skills. Students delivered an academic presentation and then changed roles to rate their peers’ performance and likeability, using an 18-item oral presentation scale and a 10- item likeability questionnaire, respectively. Many-facet Rasch measurement was used to validate the data, and structural equation modeling (SEM) was used to examine the research questions. At an aggregate level, likeability explained 19.5% of the variance of the oral presentation ratings and 8.4% of rater severity. At an item-level, multiple cause-effect relationships were detected, with the likeability items explaining 6–30% of the variance in the oral presentation items. Implications of the study are discussed.
This chapter describes the listening section of the Internet-Based Test of English as a Foreign L... more This chapter describes the listening section of the Internet-Based Test of English as a Foreign Language (TOEFL iBT) which was designed by Educational Testing Service (ETS). The TOEFL iBT is administered in many testing centers around the world and is used to measure academic English language proficiency of test candidates who are applying to universities whose primary language of instruction and research is English.
This chapter aims to demonstrate how peer assessment can be used to generate information in suppo... more This chapter aims to demonstrate how peer assessment can be used to generate information in support of teaching and learning in Singapore and other educational settings. The chapter reports on the development of the tertiary-level English oral presentation scale (TEOPS) which is used in a science communication module in a major Singaporean university. A survey of peer assessment and oral presentations is conducted and the multicomponential model of TEOPS is presented. In addition, the importance of the assessment of oracy and presentation skills in Singapore is discussed and a narration of the validation studies of TEOPS, which use many-facet Rasch measurement (MFRM) and students' perception, is presented. The author elaborates on how this scale can be used for peer assessment and provides directions for future research on the peer assessment of oral presentations.
This entry seeks to examine second language (L2) listening comprehension from a subskill-based ap... more This entry seeks to examine second language (L2) listening comprehension from a subskill-based approach. It provides an overview of two models of listening comprehension, that is, the default listening construct and the listening-response model, and delineates listening subskills. It also proposes a list of the subskills that have been identified and validated through empirical research. The entry concludes by discussing the potential relationships between the subskills and the limitation of the listening comprehension research. APA citation: Aryadoust, V. (2017). Taxonomies of listening skills. In J. I. Liontas and M. DelliCarpini (Eds.), The TESOL encyclopedia of English language teaching. John Wiley in partnership with TESOL International.
Two models of listening comprehension are presented: a cognitive model for non-assessment setting... more Two models of listening comprehension are presented: a cognitive model for non-assessment settings and a language proficiency model which has been applied extensively to the assessment of listening. The similarities of the models are then discussed, and a general framework for communicative assessment of listening is proposed. The framework considers socio-cognitive aspects of listening assessment and lends itself to both in-class and beyond-class assessment situations.
The Academic Listening Self-rating Questionnaire (ALSA) is a 47-item self-appraisal tool which he... more The Academic Listening Self-rating Questionnaire (ALSA) is a 47-item self-appraisal tool which helps language learners evaluate their own academic listening skills (Aryadoust, Goh, & Lee, 2012). The underlying dimensions of ALSA consist of linguistic components and prosody, cognitive processing skills, relating input to other materials, note-taking, memory and concentration, and lecture structure. The psychometric quality of ALSA has been studied using the Rating Scale Rasch model, structural equation modeling, and correlation analyses. The ALSA can be used to raise tertiary-level students' awareness of their academic listening ability, and of the elements of academic discourse such as lectures and seminars that may affect their academic achievement. Further research is being undertaken to provide validity evidence for two versions of the instrument in Chinese and Turkish, respectively.
Coh-Metrix has emerged as a promising psycholinguistic tool in writing and reading research. Rese... more Coh-Metrix has emerged as a promising psycholinguistic tool in writing and reading research. Researchers have used Coh-Metrix to predict English proficiency of first and second language learners. The common statistical method used in predictive modeling research is a multiple linear regression model, which has achieved varying degrees of success. This chapter examines the relative merits of the learning/validation method applied in previous Coh-Metrix studies and then proposes genetic algorithm-based symbolic regression as an alternative and efficient approach which provides robust evidence for the predictive power of some of the Coh-Metrix indices. Using a sample of papers written by university students (n = 450), the author demonstrates that genetic algorithm-based symbolic regression is capable of significantly minimizing the error of measurement and providing a much clearer understanding of the data.
In this chapter, we aim to justify a neurobiological approach to language assessment. We argue th... more In this chapter, we aim to justify a neurobiological approach to language assessment. We argue that neuroscience and genetics offer great potential for language assessment research, specifically in defining and operationalizing language constructs and validation processes, and also for better understanding of second language acquisition and the changes within the brain that relate directly to changes in proficiency levels. Further, we note that converging evidence from the fields of language assessment, cognitive neuroscience, and genetics is enabling the reconceptualization of test takers' competence and performance (see Fox & Hirotani, this volume). Integrating current data analysis methods used in language testing, neuroscience, and genetics would lend a multi-dimensional perspective to assessment and take into consideration the advancements in language assessment, psychometrics, and neuroscience.
Data from 230 test takers who answered 60 reading test items in an Iranian reading test were subj... more Data from 230 test takers who answered 60 reading test items in an Iranian reading test were subjected to Rasch measurement analysis to yield item difficulty parameters. Seven Coh-Metrix attributes (left embeddedness, CELEX, preposition phrase density, verb overlap, imagability of content words, text easability, and lexical diversity) were used as variables to sort test items into two difficulty categories, high-difficulty and low-difficulty. An artificial neural network (ANN) model was applied, with 47 items (82%) used to train the network, 10 (17.5%) items used for testing, and three excluded. The model correctly categorized test items in 89.4% and 100% of cases in the training and testing samples, respectively. The most important variable in classifying items was left embeddedness, an index of syntactic complexity, and the least important was lexical diversity. Overall, the study shows that neural networks have a high precision in classifying low and high difficulty reading test items.
Over the past few decades, the field of language assessment has grown in importance, sophisticati... more Over the past few decades, the field of language assessment has grown in importance, sophistication, and scope. The increasing internationalization of educational and work contexts, heightened global understanding of the role of assessment in learning (e.g., Black & Wiliam, 2001; Fox, 2014; Rea-Dickens, 2001), greater emphasis on the assessment of educational outcomes (e.g., Biggs & Tang, 2007), and the concomitant expansion of the language testing industry (e.g., Alderson, 2009) have led to unprecedented changes in assessment practices and approaches. These advancements, spurred on by technological innovation and a burgeoning array of new data analysis techniques, have prompted some to suggest (e.g., McNamara, 2014) that language assessment is on the verge of a revolution....
http://www.cambridgescholars.com/trends-in-language-assessment-research-and-practice Despite prod... more http://www.cambridgescholars.com/trends-in-language-assessment-research-and-practice
Despite prodigious developments in the field of language assessment in the Middle East and the Pacific Rim, research and practice in these areas have been underrepresented in mainstream literature. This volume takes a fresh look at language assessment in these regions, and provides a unique overview of contemporary language assessment research. In compiling this book, the editors have tapped into the knowledge of language and educational assessment experts whose diversity of perspectives and experience has enriched the focus and scope of language and educational assessment in general, and the present volume in particular. The six ‘trends’ addressed in the 26 chapters that comprise this title consider such contemporary topics as data mining, in-class assessment, and washback. The contributors explore new approaches and techniques in language assessment including advances resulting from multidisciplinary collaboration with researchers in computer science, genetics, and neuroscience. The current trends and promising new directions identified in this volume and the research reported here suggest that researchers across the Middle East and the Pacific Rim are playing—and will continue to play—an important role in advancing the quality, utility, and fairness of language testing and assessment practices.
Our interest in putting together the present volume grew out of a burgeoning stream of research i... more Our interest in putting together the present volume grew out of a burgeoning stream of research into language assessment in the Middle East and the Pacific Rim. As the focus on education and the role of English language teaching continues to intensify across these regions at an unprecedented rate, assessing communication skills becomes an increasingly significant field. Some of the major universities in these regions have had a long history in teaching and assessing English and other languages, and researchers, practitioners, and scholars alike have attempted to develop innovative assessment approaches and techniques to address the pressing needs of language test developers and test takers. At the same time, multiple annual conferences, such as Pacific Rim Objective Measurement Symposium (PROMS) and the Asian Association for Language Assessment (AALA) conference, have been launched to bring scholars together and keep them updated about the latest developments in language and educational assessment in these regions.
Bagheri, M.S., Nikpoor, S., & Aryadoust, S.V. (2007). Crack IELTS in a flash. Shiraz: Sandbad Pub... more Bagheri, M.S., Nikpoor, S., & Aryadoust, S.V. (2007). Crack IELTS in a flash. Shiraz: Sandbad Publication.
Computer Assisted Language Learning, 2020
A fundamental requirement of language assessments which is underresearched in computerized assess... more A fundamental requirement of language assessments which is underresearched in computerized assessments is impartiality (fairness) or equal treatment of test takers regardless of background. The present study aimed to evaluate fairness in the Pearson Test of English (PTE) Academic Reading test, which is a computerized reading assessment, by investigating differential item functioning (DIF) across Indo-European (IE) and Non-Indo-European (NIE) language families. Previous research has shown that similarities between readers’ mother tongue and the second language being learned can advantage some test takers. To test this hypothesis, we analyzed data from 783 international test takers who took the PTE Academic test, using the partial credit model in Rasch measurement. We examined two main types of DIF: uniform DIF (UDIF), which occurs when an item consistently gives a particular group of test takers an advantage across all levels of ability, and non-uniform DIF (NUDIF), which occurs when the performance of test takers varies across the ability continuum. The results showed no statistically significant UDIF (p > 0.05), but identified 3 NUDIF items out of 10 items across the language families. A mother tongue advantage was not observed. Similarity in test takers’ level of computer and Internet skills, test preparation, and language policies could contribute to the finding of no UDIF. Post-hoc content analysis of items suggested that the decrease of mother tongue advantage for IE groups in high-proficiency groups and lucky guesses of low-ability groups may have contributed to the emergence of NUDIF items. Lastly, recommendations for investigating social and contextual factors are proposed.
Studies in Educational Evaluation, 2020
The computerization of reading assessments has presented a set of new challenges to test designer... more The computerization of reading assessments has presented a set of new challenges to test designers. From the vantage point of measurement invariance, test designers must investigate whether the traditionally recognized causes for violating invariance are still a concern in computer-mediated assessments. In addition, it is necessary to understand the technology-related causes of measurement invariance among test-taking populations. In this study, we used the available data (n = 800) from the previous administrations of the Pearson Test of English Academic (PTE Academic) reading, an international test of English comprising 10 test items, to investigate measurement invariance across gender and the Information and Communication Technology Development index (IDI). We conducted a multi-group confirmatory factor analysis (CFA) to assess invariance at four levels: configural, metric, scalar, and structural. Overall, we were able to confirm structural invariance for the PTE Academic, which is a necessary condition for conducting fair assessments. Implications for computer-based education and the assessment of reading are discussed.
https://www.sciencedirect.com/science/article/pii/S0191491X19301452
Routledge, 2019
Volume I of Quantitative Data Analysis for Language Assessment is a resource book that presents t... more Volume I of Quantitative Data Analysis for Language Assessment is a resource book that presents the most fundamental techniques of quantitative data analysis in the field of language assessment. Each chapter provides an accessible explanation of the selected technique, a review of language assessment studies that have used the technique, and finally, an example of an authentic study that uses the technique. Readers also get a taste of how to apply each technique through the help of supplementary online resources that include sample data sets and guided instructions. Language assessment students, test designers, and researchers should find this a unique reference as it consolidates theory and application of quantitative data analysis in language assessment.
The purpose of the present study was twofold: (a) it examined the relationship between peer-rated... more The purpose of the present study was twofold: (a) it examined the relationship between peer-rated likeability and peer-rated oral presentation skills of 96 student presenters enrolled in a science communication course, and (b) it investigated the relationship between student raters’ severity in rating presenters’ likeability and their severity in evaluating presenters’ skills. Students delivered an academic presentation and then changed roles to rate their peers’ performance and likeability, using an 18-item oral presentation scale and a 10- item likeability questionnaire, respectively. Many-facet Rasch measurement was used to validate the data, and structural equation modeling (SEM) was used to examine the research questions. At an aggregate level, likeability explained 19.5% of the variance of the oral presentation ratings and 8.4% of rater severity. At an item-level, multiple cause-effect relationships were detected, with the likeability items explaining 6–30% of the variance in the oral presentation items. Implications of the study are discussed.
This chapter describes the listening section of the Internet-Based Test of English as a Foreign L... more This chapter describes the listening section of the Internet-Based Test of English as a Foreign Language (TOEFL iBT) which was designed by Educational Testing Service (ETS). The TOEFL iBT is administered in many testing centers around the world and is used to measure academic English language proficiency of test candidates who are applying to universities whose primary language of instruction and research is English.
This chapter aims to demonstrate how peer assessment can be used to generate information in suppo... more This chapter aims to demonstrate how peer assessment can be used to generate information in support of teaching and learning in Singapore and other educational settings. The chapter reports on the development of the tertiary-level English oral presentation scale (TEOPS) which is used in a science communication module in a major Singaporean university. A survey of peer assessment and oral presentations is conducted and the multicomponential model of TEOPS is presented. In addition, the importance of the assessment of oracy and presentation skills in Singapore is discussed and a narration of the validation studies of TEOPS, which use many-facet Rasch measurement (MFRM) and students' perception, is presented. The author elaborates on how this scale can be used for peer assessment and provides directions for future research on the peer assessment of oral presentations.
This entry seeks to examine second language (L2) listening comprehension from a subskill-based ap... more This entry seeks to examine second language (L2) listening comprehension from a subskill-based approach. It provides an overview of two models of listening comprehension, that is, the default listening construct and the listening-response model, and delineates listening subskills. It also proposes a list of the subskills that have been identified and validated through empirical research. The entry concludes by discussing the potential relationships between the subskills and the limitation of the listening comprehension research. APA citation: Aryadoust, V. (2017). Taxonomies of listening skills. In J. I. Liontas and M. DelliCarpini (Eds.), The TESOL encyclopedia of English language teaching. John Wiley in partnership with TESOL International.
Two models of listening comprehension are presented: a cognitive model for non-assessment setting... more Two models of listening comprehension are presented: a cognitive model for non-assessment settings and a language proficiency model which has been applied extensively to the assessment of listening. The similarities of the models are then discussed, and a general framework for communicative assessment of listening is proposed. The framework considers socio-cognitive aspects of listening assessment and lends itself to both in-class and beyond-class assessment situations.
The Academic Listening Self-rating Questionnaire (ALSA) is a 47-item self-appraisal tool which he... more The Academic Listening Self-rating Questionnaire (ALSA) is a 47-item self-appraisal tool which helps language learners evaluate their own academic listening skills (Aryadoust, Goh, & Lee, 2012). The underlying dimensions of ALSA consist of linguistic components and prosody, cognitive processing skills, relating input to other materials, note-taking, memory and concentration, and lecture structure. The psychometric quality of ALSA has been studied using the Rating Scale Rasch model, structural equation modeling, and correlation analyses. The ALSA can be used to raise tertiary-level students' awareness of their academic listening ability, and of the elements of academic discourse such as lectures and seminars that may affect their academic achievement. Further research is being undertaken to provide validity evidence for two versions of the instrument in Chinese and Turkish, respectively.
Coh-Metrix has emerged as a promising psycholinguistic tool in writing and reading research. Rese... more Coh-Metrix has emerged as a promising psycholinguistic tool in writing and reading research. Researchers have used Coh-Metrix to predict English proficiency of first and second language learners. The common statistical method used in predictive modeling research is a multiple linear regression model, which has achieved varying degrees of success. This chapter examines the relative merits of the learning/validation method applied in previous Coh-Metrix studies and then proposes genetic algorithm-based symbolic regression as an alternative and efficient approach which provides robust evidence for the predictive power of some of the Coh-Metrix indices. Using a sample of papers written by university students (n = 450), the author demonstrates that genetic algorithm-based symbolic regression is capable of significantly minimizing the error of measurement and providing a much clearer understanding of the data.
In this chapter, we aim to justify a neurobiological approach to language assessment. We argue th... more In this chapter, we aim to justify a neurobiological approach to language assessment. We argue that neuroscience and genetics offer great potential for language assessment research, specifically in defining and operationalizing language constructs and validation processes, and also for better understanding of second language acquisition and the changes within the brain that relate directly to changes in proficiency levels. Further, we note that converging evidence from the fields of language assessment, cognitive neuroscience, and genetics is enabling the reconceptualization of test takers' competence and performance (see Fox & Hirotani, this volume). Integrating current data analysis methods used in language testing, neuroscience, and genetics would lend a multi-dimensional perspective to assessment and take into consideration the advancements in language assessment, psychometrics, and neuroscience.
Data from 230 test takers who answered 60 reading test items in an Iranian reading test were subj... more Data from 230 test takers who answered 60 reading test items in an Iranian reading test were subjected to Rasch measurement analysis to yield item difficulty parameters. Seven Coh-Metrix attributes (left embeddedness, CELEX, preposition phrase density, verb overlap, imagability of content words, text easability, and lexical diversity) were used as variables to sort test items into two difficulty categories, high-difficulty and low-difficulty. An artificial neural network (ANN) model was applied, with 47 items (82%) used to train the network, 10 (17.5%) items used for testing, and three excluded. The model correctly categorized test items in 89.4% and 100% of cases in the training and testing samples, respectively. The most important variable in classifying items was left embeddedness, an index of syntactic complexity, and the least important was lexical diversity. Overall, the study shows that neural networks have a high precision in classifying low and high difficulty reading test items.
Over the past few decades, the field of language assessment has grown in importance, sophisticati... more Over the past few decades, the field of language assessment has grown in importance, sophistication, and scope. The increasing internationalization of educational and work contexts, heightened global understanding of the role of assessment in learning (e.g., Black & Wiliam, 2001; Fox, 2014; Rea-Dickens, 2001), greater emphasis on the assessment of educational outcomes (e.g., Biggs & Tang, 2007), and the concomitant expansion of the language testing industry (e.g., Alderson, 2009) have led to unprecedented changes in assessment practices and approaches. These advancements, spurred on by technological innovation and a burgeoning array of new data analysis techniques, have prompted some to suggest (e.g., McNamara, 2014) that language assessment is on the verge of a revolution....
http://www.cambridgescholars.com/trends-in-language-assessment-research-and-practice Despite prod... more http://www.cambridgescholars.com/trends-in-language-assessment-research-and-practice
Despite prodigious developments in the field of language assessment in the Middle East and the Pacific Rim, research and practice in these areas have been underrepresented in mainstream literature. This volume takes a fresh look at language assessment in these regions, and provides a unique overview of contemporary language assessment research. In compiling this book, the editors have tapped into the knowledge of language and educational assessment experts whose diversity of perspectives and experience has enriched the focus and scope of language and educational assessment in general, and the present volume in particular. The six ‘trends’ addressed in the 26 chapters that comprise this title consider such contemporary topics as data mining, in-class assessment, and washback. The contributors explore new approaches and techniques in language assessment including advances resulting from multidisciplinary collaboration with researchers in computer science, genetics, and neuroscience. The current trends and promising new directions identified in this volume and the research reported here suggest that researchers across the Middle East and the Pacific Rim are playing—and will continue to play—an important role in advancing the quality, utility, and fairness of language testing and assessment practices.
Our interest in putting together the present volume grew out of a burgeoning stream of research i... more Our interest in putting together the present volume grew out of a burgeoning stream of research into language assessment in the Middle East and the Pacific Rim. As the focus on education and the role of English language teaching continues to intensify across these regions at an unprecedented rate, assessing communication skills becomes an increasingly significant field. Some of the major universities in these regions have had a long history in teaching and assessing English and other languages, and researchers, practitioners, and scholars alike have attempted to develop innovative assessment approaches and techniques to address the pressing needs of language test developers and test takers. At the same time, multiple annual conferences, such as Pacific Rim Objective Measurement Symposium (PROMS) and the Asian Association for Language Assessment (AALA) conference, have been launched to bring scholars together and keep them updated about the latest developments in language and educational assessment in these regions.
Bagheri, M.S., Nikpoor, S., & Aryadoust, S.V. (2007). Crack IELTS in a flash. Shiraz: Sandbad Pub... more Bagheri, M.S., Nikpoor, S., & Aryadoust, S.V. (2007). Crack IELTS in a flash. Shiraz: Sandbad Publication.
International Journal of Listening, 2022
Although second language (L2) listening assessment has been the subject of much research interest... more Although second language (L2) listening assessment has been the subject of much research interest in the past few decades, there remain a multitude of challenges facing the definition and operationalization of the L2 listening construct(s). Notably, the majority of L2 listening assessment studies are based upon the (implicit) assumption that listening is reducible to cognition and metacognition. This approach ignores emotional, neurophysiological, and sociocultural mechanisms underlying L2 listening. In this paper, the role of these mechanisms in L2 listening assessment is discussed and four gaps in understanding are explored: the nature of L2 listening, the interaction between listeners and the stimuli, the role of visuals, and authenticity in L2 listening assessments. Finally, a review of the papers published in the special issue is presented and recommendations for further research on L2 listening assessments are provided.
System, 2021
Mobile-assisted language learning (MALL) is a novel approach to language learning and teaching. T... more Mobile-assisted language learning (MALL) is a novel approach to language learning and teaching. The present study aims to review the methodological quality of quantitative MALL research by focusing on the applications of statistical techniques and instrument reliability and validity. A total of 174 papers within 41 journals identified using the Scopus database were screened and coded. Of these, 77 quantitative MALL studies that investigated English as a foreign or second
language using mobile devices met the inclusion criteria. In the full-text screening, each study was coded for the statistical techniques applied, assumptions reported, reliability and validity investigation of the instruments and coding practices used. The results show the ubiquity of the general linear model (GLM) (i.e., mean-based data analysis, such as t-test, univariate analysis of variances (ANOVA), multivariate analysis of variances (MANOVA)), with 61.40% of the analyzed studies using this statistical method. Notably, the majority of studies that used GLM did not report confirmation of the fundamental assumptions (i.e., normality, homogeneity of variance, and linearity) of such analysis. In addition, a reliance of null hypothesis significance testing was observed without reporting of the practical significance of the investigated effect or relations (effect size). Lastly, less than half of MALL studies reported reliability and even fewer studies reported validity evidence, indicating a lack of evidence of the precision, meaningfulness of data, and accuracy of many of the measurement instruments used. Implications of these findings for MALL research are discussed, with several suggestions for future research.
Language Testing, 2021
The aim of this study was to investigate how test methods affect listening test takers' performan... more The aim of this study was to investigate how test methods affect listening test takers' performance and cognitive load. Test methods were defined and operationalized as whilelistening performance (WLP) and post-listening performance (PLP) formats. To achieve the goal of the study, we examined test takers' (N = 80) brain activity patterns (measured by functional near-infrared spectroscopy (fNIRS)), gaze behaviors (measured by eye-tracking), and listening performance (measured by test scores) across the two test methods. We found that the test takers displayed lower activity levels across brain regions supporting comprehension during the WLP tests relative to the PLP tests. Additionally, the gaze behavioral patterns exhibited during the WLP tests suggested that the test takers adopted keyword matching and "shallow listening." Together, the neuroimaging and gaze behavioral data indicated that the WLP tests imposed a lower cognitive load on the test takers than the PLP tests. However, the test takers performed better with higher test scores for one of two WLP tests compared with the PLP tests. By incorporating eye-tracking and neuroimaging in this exploration, this study has advanced the current knowledge on cognitive load and the impact imposed by different listening test methods. To advance our knowledge of test validity, other researchers could adopt our research protocol and focus on extending the test method framework used in this study.
Computer Assisted Language Learning, 2021
This study sought to examine research trends in computer-assisted language learning (CALL) using ... more This study sought to examine research trends in computer-assisted language learning (CALL) using a retrospective scientometric approach. Scopus was used to search for relevant publications on the topic and generate a dataset consisting of 3,697 studies published in 11 journals between 1977 and 2020. A document co-citation analysis method was adopted to identify the main research clusters in the dataset. The impact of each publication on the field was measured by using the burst index and the betweenness centrality and the content of influential publications was closely analysed to determine the focus of each cluster and the key themes of the studies in focus. Overall, we identified seven major clusters. We further found that leveraging synchronous computer-mediated communication and negotiated interaction, multimedia, telecollaboration or e-mail exchanges, blogs, digital games, Wikis and podcasts to support language learning was probably beneficial for language learning. Varying degrees of support were found in various studies for each of these technologies. Stronger support was found for synchronous computer-mediated communication and negotiated interaction, multimedia, telecollaboration or e-mail exchanges and digital games and weaker support was found for blogs, Wikis, and podcasts. The limitations the supporting studies listed were also considered inconsequential. On the other hand, while there was strong support for blogs, Wikis and podcasts, some major drawbacks were observed. The findings of the study would be helpful for teachers and instructors who want to decide whether to use technology in the classroom for instructional purposes. Additionally, researchers and graduate students who need to identify a research topic for their thesis or dissertation may find the results of the study useful for them, too.
International Journal of Listening, 2021
This study aimed to investigate the test-taking strategies needed for successful completion of a ... more This study aimed to investigate the test-taking strategies needed for successful completion of a lecture-based listening test by employing self-reported test-taking strategy use, actual strategy use measured via eye-tracking, and test scores. In this study, participants’ gaze behavior (measured by fixation and visit duration and frequency) were recorded while they completed two listening tests of three stages each: pre-listening, in which participants (n = 66) previewed question stems; while-listening, in which participants simultaneously listened to the recording and filled in their answers; and post-listening, in which they had time to review their answers and make necessary amendments. Following the listening tests, participants filled up a posttest questionnaire that asked about their strategy use in each of the three stages. Rasch measurement, t-test, and path analysis were performed on test scores, questionnaire results, and gaze patterns. Results suggest that gaze measures (visit duration and fixation frequency) predicted participants’ final test performance, while self-reports had moderate predicting power. The findings of this study have implications for the cognitive validity of listening tests, listening test design and pedagogical approaches in building listening competence.
Studies in Educational Evaluation, 2021
The present study conducted a systematic review of the item response theory (IRT) literature in l... more The present study conducted a systematic review of the item response theory (IRT) literature in language assessment to investigate the conceptualization and operationalization of the dimensionality of language ability. Sixty-two IRT-based studies published between 1985 and 2020 in language assessment and educational measurement journals were first classified into two categories based on a unidimensional and multidimensional research framework, and then reviewed to examine language dimensionality from technical and substantive perspectives. It was found that 12 quantitative techniques were adopted to assess language dimensionality. Exploratory factor analysis was the primary method of dimensionality analysis in papers that had applied unidimensional IRT models, whereas the comparison modeling approach was dominant in the multidimensional framework. In addition, there was converging evidence within the two streams of research supporting the role of a number of factors such as testlets, language skills, subskills, and linguistic elements as sources of multidimensionality, while mixed findings were reported for the role of item formats across research streams. The assessment of reading, listening, speaking, and writing skills was grounded within both unidimensional and multidimensional framework. By contrast, vocabulary and grammar knowledge was mainly conceptualized as unidimensional. Directions for continued inquiry and application of IRT in language assessment are provided.
This is the second neurocognitive study of language assessments produced in our lab. In addition ... more This is the second neurocognitive study of language assessments produced in our lab. In addition to the experiment, we have proposed the concept of neurocognitive validity in language assessment. We are working towards expanding on this framework. We believe neurocognitive approaches to learning and assessment will be the future of education, and it is best that pertinent frameworks be proposed and tested now.
With the advent of new technologies, assessment research has adopted technology- based methods to investigate test validity. This study investigated the neurocognitive processes involved in an academic listening comprehension test, using a biometric technique called functional near-infrared spectroscopy (fNIRS). Sixteen right-handed university students completed two tasks: (1) a linguistic task that involved listening to a mini-lecture (i.e., Listening condition) and answering of questions (i.e., Questions condition) and (2) a nonlinguistic task that involved listening to a variety of natural sounds and animal vocalizations (i.e., Sounds condition). The hemodynamic activity in three left brain regions was measured: the inferior frontal gyrus (IFG), dorsomedial prefrontal cortex (dmPFC), and posterior middle temporal gyrus (pMTG). The Listening condition induced higher activity in the IFG and pMTG than the Sounds condition. Although not statistically significant, the activity in the dmPFC was higher during the Listening condition than in the Sounds conditions. The IFG was also significantly more active during the Listening condition than in the Questions condition. Although a significant gender difference was observed in listening comprehension test scores, there was no difference in brain activity (across the IFG, dmPFC, and pMTG) between male and female participants. The implications for test validity are discussed.
Frontiers in Psychology, 2020
This study set out to investigate intellectual domains as well as the use of measurement and vali... more This study set out to investigate intellectual domains as well as the use of measurement and validation methods in language assessment research and second language acquisition (SLA) published in English in peer-reviewed journals. Using Scopus, we created two datasets: (i) a dataset of core journals consisting of 1,561 articles published in four language assessment journals, and (ii) a dataset of general journals consisting of 3,175 articles on language assessment published in the top journals of SLA and applied linguistics. We applied document co-citation analysis to detect thematically distinct research clusters. Next, we coded citing papers in each cluster based on an analytical framework for measurement and validation. We found that the focus of the core journals was more exclusively on reading and listening comprehension assessment (primary), facets of speaking and writing performance such as raters and validation (secondary), as well as feedback, corpus linguistics, and washback (tertiary). By contrast, the primary focus of assessment research in the general journals was on vocabulary, oral proficiency, essay writing, grammar, and reading. The secondary focus was on affective schemata, awareness, memory, language proficiency, explicit vs. implicit language knowledge, language or semantic awareness, and semantic complexity. With the exception of language proficiency, this second area of focus was absent in the core journals. It was further found that the majority of citing publications in the two datasets did not carry out inference-based validation on their instruments before using them. More research is needed to determine what motivates authors to select and investigate a topic, how thoroughly they cite past research, and what internal (within a field) and external (between fields) factors lead to the sustainability of a Research Topic in language assessment.
Language Testing, 2020
Over the past decades, the application of Rasch measurement in language assessment has gradually ... more Over the past decades, the application of Rasch measurement in language assessment has gradually increased. In the present study, we reviewed and coded 215 papers using Rasch measurement published in 21 applied linguistics journals for multiple features. We found that seven Rasch models and 23 software packages were adopted in these papers, with many-facet Rasch measurement (n = 100) and Facets (n = 113) being the most frequently used Rasch model and software, respectively. Significant differences were detected between the number of papers that applied Rasch measurement to different language skills and components, with writing (n = 63) and grammar (n = 12) being the most and least frequently investigated, respectively. In addition, significant differences were found between the number of papers reporting person separation (n = 73, not reported: n = 142) and item separation (n = 59, not reported: n = 156) and those that did not. An alarming finding was how few papers reported unidimensionality check (n = 57 vs 158) and local independence (n = 19 vs 196). Finally, a multilayer network analysis revealed that research involving Rasch measurement has created two major discrete communities of practice (clusters), which can be characterized by features such as language skills, the Rasch models used, and the reporting of item reliability/separation vs person reliability/separation. Cluster 1 was accordingly labelled the production and performance cluster, whereas cluster 2 was labelled the perception and language elements cluster. Guidelines and recommendations for analyzing unidimensionality, local independence, data-to-model fit, and reliability in Rasch model analysis are proposed.
Computer Assisted Language Learning, 2020
This is the first study to investigate the effects of test methods (while-listening performance a... more This is the first study to investigate the effects of test methods (while-listening performance and post-listening performance) and gender on measured listening ability and brain activation under test conditions. Functional near-infrared spectroscopy (fNIRS) was used to examine three brain regions associated with listening comprehension: the inferior frontal gyrus and posterior middle temporal gyrus, which subserve bottom-up processing in comprehension, and the dorsomedial prefrontal cortex, which mediates top-down processing. A Rasch model reliability analysis showed that listeners were homogeneous in their listening ability. Additionally, there were no significant differences in test scores across test methods and genders. The fNIRS data, however, revealed significantly different activation of the investigated brain regions across test methods, genders, and listening abilities. Together, these findings indicated that the listening test was not sensitive to differences in the neurocognitive processes underlying listening comprehension under test conditions. The implications of these findings for assessing listening and suggestions for future research are discussed.
System
Even though the field of linguistics has witnessed a growth of research in the areas of comprehen... more Even though the field of linguistics has witnessed a growth of research in the areas of comprehension (listening and reading) subskills, there is currently no universally accepted taxonomy for categorizing them. Using a dataset of 192 publications, a document co-citation analysis was conducted. Eighteen discrete research clusters were identified, comprising 73 empirically investigated comprehension subskills, of which 55 were related to first language (L1) comprehension and 18 were associated with second language (L2) comprehension. Fifteen research clusters (83.33%) were focused on lower-order L1 processing abilities in reading such as orthographic processing and speeded word reading. The remaining three clusters were relatively small, and focused on L2 comprehension subskills. The list of subskills was visualized in the form of a codex that serves as the first integrative framework for empirically investigated comprehension subskills and processing abilities. The need for conducting experimental investigations to improve the understanding of L2 comprehension subskills was highlighted.
Frontiers in Psychology, 2019
A recent review of the literature concluded that Rasch measurement is an influential approach in ... more A recent review of the literature concluded that Rasch measurement is an influential approach in psychometric modeling. Despite the major contributions of Rasch measurement to the growth of scientific research across various fields, there is currently no research on the trends and evolution of Rasch measurement research. The present study used co-citation techniques and a multiple perspectives approach to investigate
5,365 publications on Rasch measurement between 01 January 1972 and 03 May 2019 and their 108,339 unique references downloaded from the Web of Science (WoS). Several methods of network development involving visualization and text-mining were used to analyze these data: author co-citation analysis (ACA), document co-citation analysis (DCA), journal author co-citation analysis (JCA), and keyword analysis. In
addition, to investigate the inter-domain trends that link the Rasch measurement specialty to other specialties, we used a dual-map overlay to investigate specialty-to-specialty connections. Influential authors, publications, journals, and keywords were identified. Multiple research frontiers or sub-specialties were detected and the major ones were
reviewed, including “visual function questionnaires”, “non-parametric item response theory”, “validmeasures (validity)”, “latent classmodels”, and “many-facet Rasch model”. One of the outstanding patterns identified was the dominance and impact of publications written for general groups of practitioners and researchers. In personal communications, the authors of these publications stressed their mission as being “teachers” who aim to promote Rasch measurement as a conceptual model with real-world applications. Based on these findings, we propose that sociocultural and ethnographic factors have
a huge capacity to influence fields of science and should be considered in future investigations of psychometrics and measurement. As the first scientometric review of the Rasch measurement specialty, this study will be of interest to researchers, graduate students, and professors seeking to identify research trends, topics, major publications, and influential scholars.
Frontiers in Psychology, 2019
A recent review of the literature concluded that Rasch measurement is an influential approach in ... more A recent review of the literature concluded that Rasch measurement is an influential approach in psychometric modeling. Despite the major contributions of Rasch measurement to the growth of scientific research across various fields, there is currently no research on the trends and evolution of Rasch measurement research. The present study used co-citation techniques and a multiple perspectives approach to investigate 5,365 publications on Rasch measurement between 01 January 1972 and 03 May 2019 and their 108,339 unique references downloaded from the Web of Science (WoS). Several methods of network development involving visualization and text-mining were used to analyze these data: author co-citation analysis (ACA), document co-citation analysis (DCA), journal author co-citation analysis (JCA), and keyword analysis. In addition, to investigate the inter-domain trends that link the Rasch measurement specialty to other specialties, we used a dual-map overlay to investigate specialty-to-specialty connections. Influential authors, publications, journals, and keywords were identified. Multiple research frontiers or sub-specialties were detected and the major ones were reviewed, including "visual function questionnaires", "non-parametric item response theory", "valid measures (validity)", "latent class models", and "many-facet Rasch model". One of the outstanding patterns identified was the dominance and impact of publications written for general groups of practitioners and researchers. In personal communications, the authors of these publications stressed their mission as being "teachers" who aim to promote Rasch measurement as a conceptual model with real-world applications. Based on these findings, we propose that sociocultural and ethnographic factors have a huge capacity to influence fields of science and should be considered in future investigations of psychometrics and measurement. As the first scientometric review of the Rasch measurement specialty, this study will be of interest to researchers, graduate students, and professors seeking to identify research trends, topics, major publications, and influential scholars.
Eye tracking technology has become an increasingly popular methodology in language studies. Using... more Eye tracking technology has become an increasingly popular methodology in language studies. Using data from 27 journals in language sciences indexed in the Social Science Citation Index and/or Scopus, we conducted an in-depth scientometric analysis of 341 research publications together with their 14,866 references between 1994 and 2018. We identified a number of countries, researchers, universities, and institutes with large numbers of publications in eye tracking research in language studies. We further discovered a mixed multitude of connected research trends that have shaped the nature and development of eye tracking research. Specifically, a document co-citation analysis revealed a number of major research clusters, their key topics, connections, and bursts (sudden citation surges). For example, the foci of clusters #0 through #5 were found to be perceptual learning, regressive eye movement(s), attributive adjective(s), stereotypical gender, discourse processing, and bilingual adult(s). The content of all the major clusters was closely examined and synthesized in the form of an in-depth review. Finally, we grounded the findings within a data-driven theory of scientific revolution and discussed how the observed patterns have contributed to the emergence of new trends. As the first scientometric investigation of eye tracking research in language studies, the present study offers several implications for future research that are discussed.
Computer Assisted Language Learning, 2019
The aim of the present study is two-fold. Firstly, it uses eye tracking to investigate the dynami... more The aim of the present study is two-fold. Firstly, it uses eye tracking to investigate the dynamics of item reading, both in multiple choice (MCQ) and matching items, before and during two hearings of listening passages in a computerized while-listening performance (WLP) test. Secondly, it investigates answer changing during the two hearings, which include four rounds of item reading taking place during: pre-listening in hearing 1, while-listening in hearing 1, pre-listening in hearing 2, and while-listening in hearing 2. The listening test was completed by 28 secondary school students in different sessions. Using time series, cross-correlation functions, and multivariate data analyses, we found that listeners tended to quickly skim the test items, distractors, and answers during pre-listening in hearing 1 and pre-listening in hearing 2. By contrast, during while-listening in hearing 1 and while-listening in hearing 2, significantly more attention was paid to the written stems, distractors, and options. The increment in attention to the written stems, distractors, and options was greater for the matching items and interactions between item format and item reading were also detected. Additionally, we observed a mixed answer changing pattern (i.e., incorrect-to-correct and correct-to-incorrect), although the dominant pattern for both item formats (67%) was wrong-to-correct. Implications of the findings for language research are discussed.
Imagination, Cognition and Personality, 2019
This study investigates the dimensions of visual mental imagery (VMI) in aural discourse comprehe... more This study investigates the dimensions of visual mental imagery (VMI) in aural discourse comprehension. We introduce a new approach to inspect VMIs which integrates forensic arts and latent class analysis. Thirty participants listened to three descriptive oral excerpts and then verbalized what they had seen in their mind’s eye. The verbalized descriptions were simultaneously illustrated by two trained artists using the Adobe PhotoshopVR and the digital drawing tablets with electromagnetic induction technology, generating approximations of the VMIs. Next, a code sheet was developed to examine the illustrated VMIs on 16 dimensions. Latent class analysis identified three classes of VMI imaginers with nine discriminating dimensions: clarity, completeness of figures, details, shape crowdedness, shapeadded features, texture, space, time and motion, and flamboyance. The groups classes further differentiated by significant differences in their listening abilities. An individual lacking the ability to imagine (a condition called Aphantasia) and some
evidence that VMIs in listening are both symbolic and depictive were also found.
This study investigates the underlying structure of the listening test of the Singapore-Cambridge... more This study investigates the underlying structure of the listening test of the Singapore-Cambridge General Certificate of Education (GCE) exam, comparing the fit of five cognitive diagnostic assessment models comprising the deterministic input noisy “and” gate (DINA), generalized DINA (G-DINA), deterministic input noisy “or” gate (DINO), higher-order DINA (HO-DINA), and the reduced reparameterized unified model (RRUM). Through model-comparisons, a nine-subskill RRUM model was found to possess the optimal fit. The study shows that students’ listening test performance depends on an array of test-specific facets, such as the ability to eliminate distractors in multiple-choice questions alongside listening-specific subskills such as the ability to make inferences. The validated list of the listening subskills can be employed as a useful guideline to prepare students for the GCE listening test at schools.
A B S T R A C T The present study applied recursive partitioning Rasch trees to a large-scale rea... more A B S T R A C T The present study applied recursive partitioning Rasch trees to a large-scale reading comprehension test (n = 1550) to identify sources of DIF. Rasch trees divide the sample by subjecting the data to recursive non-linear partitioning and estimate item difficulty per partition. The variables used in the recursive partitioning of the data were vocabulary and grammar knowledge and gender of the test takers. This generated 11 non-pre-specified DIF groups, for which the item difficulty parameters varied significantly. This is grounded within the third generation of DIF analysis and it is argued that DIF induced by the readers' vocabulary and grammar knowledge is not construct-irrelevant. In addition, only 204 (13.16%) test takers who had significantly high grammar scores were affected by gender DIF. This suggests that DIF caused by manifest variables only influences certain subgroups of test takers with specific ability profiles, thus creating a complex network of relationships between construct-relevant and-irrelevant variables.
This article proposes an integrated cognitive theory of reading and listening that draws on a max... more This article proposes an integrated cognitive theory of reading and listening that draws on a maximalist
account of comprehension and emphasizes the role of bottom-up and top-down processing. The
theoretical framework draws on the findings of previous research and integrates them into a coherent
and plausible narrative to explain and predict the comprehension of written and auditory inputs. The
theory is accompanied by a model that schematically represents the fundamental components of the
theory and the comprehension mechanisms described. The theory further highlights the role of perception
and word recognition (underresearched in reading research), situation models (missing in listening
research), mental imagery (missing in both streams), and inferencing. The robustness of the theory is
discussed in light of the principles of scientific theories adopted from Popper (1959).
To cite this article: Vahid Aryadoust & Mehdi Riazi (2017) Future directions for assessing for le... more To cite this article: Vahid Aryadoust & Mehdi Riazi (2017) Future directions for assessing for learning in second language writing research: epilogue to the special issue, Educational Psychology, 37:1, 82-89,
Though significant discussions in writing assessment literature focus on understanding the relati... more Though significant discussions in writing assessment literature focus on understanding the relationship between the quality of second language (L2) students’ texts measured by human judges and linguistic features identified by automated rating engines such as Coh-Metrix, little attention (if any) has been given to assessing reflective essays presented as individual student blog posts in a tertiary level communication course. The present study examines the relationship between the linguistic features of the reflective blog posts of Asian university learners enrolled in a professional communication course as measured by Coh-Metrix and these posts’ quality as assessed by human raters in discrete assessments. Rather than using traditional linear regression methods, the data was subjected to classification and regression trees (CART) to address this specific research question as follows:
How might Coh-Metrix indices of linguistic features including lexical diversity, syntactic complexity, word frequency, and grammatical accuracy relate to the assessment of these reflection essays made by the instructor?
This study uses the data from 104 tertiary students enrolled in a communication module. They completed four writing tasks at four time points (i.e., Pre-Course, Mid-1-Course, Mid-2-Course, and End-Course), yielding 416 essays, which were marked holistically by both human raters and analyzed via Coh-Metrix. 84 linguistic features for each essay (including vocabulary sophistication, lexical diversity, syntactic sophistication, and cohesion statistics) were recorded.
A description of the nature of the reflective blog posts will be presented, along with the rationale for this study, more details on the methodology used for analyzing each post and preliminary findings. It will also be argued that using CART modeling to predict essay quality from linguistic features is novel. Unlike linear regression models, CART relaxes normality assumption, optimizing the predictive power of the analysis.
In a series of YouTube videos, I provide systematic guidelines for using SPSS, WINSTEPS, and othe... more In a series of YouTube videos, I provide systematic guidelines for using SPSS, WINSTEPS, and other statistical software and interpreting their output. Please subscribe to receive notifications when new videos are released:
https://www.youtube.com/channel/UCfu2GCdjq50W-kL-cv3rcLw?view_as=subscriber
Special Issue on Research into Learner Listening Guest Editors: Christine C. M. Goh and Vahid Ary... more Special Issue on Research into Learner Listening
Guest Editors: Christine C. M. Goh and Vahid Aryadoust
Special Issue on Using Assessment Tasks for Improving Second Language Writing EDUCATIONAL PSYCH... more Special Issue on Using Assessment Tasks for Improving Second Language Writing
EDUCATIONAL PSYCHOLOGY
AN INTERNATIONAL JOURNAL OF EXPERIMENTAL EDUCATIONAL PSYCHOLOGY
ALAK 2010 Annual Conference, Jan 1, 2010
It has been argued that item difficulty can affect the fit of a confirmatory factor analysis (CFA... more It has been argued that item difficulty can affect the fit of a confirmatory factor analysis (CFA) model (McLeod, Swygert, & Thissen, 2001; Sawaki, Sticker, & Andreas, 2009). We explored the effect of items with outlying difficulty measures on the CFA model of the listening module of International English Language Testing System (IELTS). The test has four sections comprising 40 items altogether (10 items in each section). Each section measures a different listening skill making the test a conceptually four-dimensional assessment instrument...
Paper presented at the fourth ALTE, Krakow, Poland.
Research into the psychological and cognitive aspects of language learning, and second language (... more Research into the psychological and cognitive aspects of language learning, and second language (L2) learning in particular, demands new measurement tools that provide highly detailed information about language learners’ progress and proficiency. A new development in measurement models is Cognitive Diagnostic Assessment (CDA), which helps language assessment researchers evaluate students’ mastery of specific language sub-skills with greater specificity than other item response theory models. This paper discusses the tenets of CDA models in general and the fusion model (FM) in particular, and reports the results of a study applying the FM to lecture-comprehension section of the International English Language Testing System (IELTS) listening module. FM separates only two major listening sub-skills (i.e., the ability to understand explicitly stated information and make close paraphrases), likely indicating construct-underrepresentation. It also provides a master / non-mastery profile of test takers. Implications for assessing listening comprehension and IELTS are discussed.
The application of MFRM to writing tests of English.
factor structure of the IELTS listening test
This report reviews three prominent conceptualizations of validity (i.e., Embretson, 1983; Kane, ... more This report reviews three prominent conceptualizations of validity (i.e., Embretson, 1983; Kane, 2002; Messick, 1989) to lay out a validity argument (VA) for the International English Proficiency Test (IEPT). To build and support the VA for the IEPT, we endorse Kane’s (2004, 2006, 2012) conceptualization which defines validity as a two-stage undertaking: making claims about the uses and interpretations of the scores (or interpretive argument) and evaluating the claims (or VA). The report further proposes several rigorous research methods and psychometric models to support the VA. The document, however, does not compare these concepts. For further information, readers are referred to Aryadoust (forthcoming).
Researchers have recently shown an increased interest in examining the link between the assessmen... more Researchers have recently shown an increased interest in examining the link between the assessment of second language (ESL) students’ written texts by human raters and assessment by automated rating machines. Studies show that, depending on their training and background, human raters are relatively reliable in writing assessment (Weigle, 2002). However, human ratings need double-marking and have logistical limitations. To overcome these constraints, researchers have recently turned to automated rating machines. Rating machines are economical, and have become increasingly more reliable as a result of recent developments in applied linguistics and computer science....
Education and Information Technologies
Several studies have evaluated sentence structure and vocabulary (SSV) as a scoring criterion in ... more Several studies have evaluated sentence structure and vocabulary (SSV) as a scoring criterion in assessing writing, but no consensus on its functionality has been reached. The present study presents evidence that this scoring criterion may not be appropriate in writing assessment. Scripts by 182 ESL students at two language centers were analyzed with the Rasch partial credit model. Although other scoring criteria functioned satisfactorily, SSV scores did not fit the Rasch model, and analysis of residuals showed SSV scoring on most test prompts loaded on a benign secondary dimension. The study proposes that a lexico-grammatical scoring criterion has potentially conflicting properties, and therefore recommends considering separate vocabulary and grammar criteria in writing assessment.
Quantitative Data Analysis for Language Assessment Volume II, 2019
Although language assessment and testing can be viewed as having a much longer history (Spolsky, ... more Although language assessment and testing can be viewed as having a much longer history (Spolsky, 2017; Farhady, 2018), its genesis as a research field is often attributed to Carroll’s (1961) and Lado’s (1961) publications. Over the past decades, the field has gradually grown in scope and sophistication as researchers have adopted various interdisciplinary approaches to problematize and address old and new issues in language assessment as well as learning. The assessment and validation of reading, listening, speaking, and writing, as well as language elements such as vocabulary and grammar have formed the basis of extensive studies (e.g., Chapelle, 2008). Emergent research areas in the field include the assessment of sign languages (Kotowicz et al., 2021). In addition, researchers have employed a variety of psychometric and statistical methods to investigate research questions and hypotheses (see chapters in Aryadoust and Raquel, 2019, 2020). The present special issue entitled “Front...
Test fairness has been recognised as a fundamental requirement of test validation. Two quantitati... more Test fairness has been recognised as a fundamental requirement of test validation. Two quantitative approaches to investigate test fairness, the Rasch-based differential item functioning (DIF) detection method and a measurement invariance technique called multiple indicators, multiple causes (MIMIC), were adopted and compared in a test fairness study of the Pearson Test of English (PTE) Academic Reading test (n = 783). The Rasch partial credit model (PCM) showed no statistically significant uniform DIF across gender and, similarly, the MIMIC analysis showed that measurement invariance was maintained in the test. However, six pairs of significant non-uniform DIF (p < 0.05) were found in the DIF analysis. A discussion of the results and post-hoc content analysis is presented and the theoretical and practical implications of the study for test developers and language assessment are discussed.
This study evaluated the validity of the Michigan English Test (MET) Listening Section by investi... more This study evaluated the validity of the Michigan English Test (MET) Listening Section by investigating its underlying factor structure and the replicability of its factor structure across multiple...
Frontiers in Psychology
Social interactions accompany individuals throughout their whole lives. When examining the underl... more Social interactions accompany individuals throughout their whole lives. When examining the underlying mechanisms of social processes, dynamics of synchrony, coordination or attunement emerge between individuals at multiple levels. To identify the impactful publications that studied such mechanisms and establishing the trends that dynamically originated the available literature, the current study adopted a scientometric approach. A sample of 543 documents dated from 1971 to 2021 was derived from Scopus. Subsequently, a document co-citation analysis was conducted on 29,183 cited references to examine the patterns of co-citation among the documents. The resulting network consisted of 1,759 documents connected to each other by 5,011 links. Within the network, five major clusters were identified. The analysis of the content of the three major clusters—namely, “Behavioral synchrony,” “Towards bio-behavioral synchrony,” and “Neural attunement”—suggests an interest in studying attunement in...
Research in Developmental Disabilities
International Journal of Listening
This study investigates the underlying structure of the listening test of the Singapore–Cambridge... more This study investigates the underlying structure of the listening test of the Singapore–Cambridge General Certificate of Education (GCE) exam, comparing the fit of five cognitive diagnostic assessment models comprising the deterministic input noisy “and” gate (DINA), generalized DINA (G-DINA), deterministic input noisy “or” gate (DINO), higher-order DINA (HO-DINA), and the reduced reparameterized unified model (RRUM). Through model-comparisons, a nine-subskill RRUM model was found to possess the optimal fit. This study shows that students’ listening test performance depends on an array of test-specific facets, such as the ability to eliminate distractors in multiple-choice questions alongside listening-specific subskills such as the ability to make inferences. The validated list of the listening subskills can be employed as a useful guideline to prepare students for the GCE listening test at schools.
International Journal of Listening
This article proposes an integrated cognitive theory of reading and listening that draws on a max... more This article proposes an integrated cognitive theory of reading and listening that draws on a maximalist account of comprehension and emphasizes the role of bottom-up and top-down processing. The theoretical framework draws on the findings of previous research and integrates them into a coherent and plausible narrative to explain and predict the comprehension of written and auditory inputs. The theory is accompanied by a model that schematically represents the fundamental components of the theory and the comprehension mechanisms described. The theory further highlights the role of perception and word recognition (underresearched in reading research), situation models (missing in listening research), mental imagery (missing in both streams), and inferencing. The robustness of the theory is discussed in light of the principles of scientific theories adopted from Popper (1959).
English Language Education, 2016
The effectiveness of a language test to meaningfully diagnose a learner’s language proficiency re... more The effectiveness of a language test to meaningfully diagnose a learner’s language proficiency remains in some doubt. Alderson (2005) claims that diagnostic tests are superficial because they do not inform learners what they need to do in order to develop; “they just identify strengths and weaknesses and their remediation” (p. 1). In other words, a test cannot claim to be diagnostic unless it facilitates language development in the learner. In response to the perceived need for a mechanism to both provide diagnostic information and specific language support, four Hong Kong universities have developed the Diagnostic English Language Tracking Assessment (DELTA), which could be said to be meaningfully diagnostic because it is both integrated into the English language learning curriculum and used in combination with follow-up learning resources to guide independent learning.
Turkish Online Journal of Distance Education, 2010