Rikkert M van der Lans | University of Groningen (original) (raw)
Papers by Rikkert M van der Lans
Supplemental material, sj-pdf-2-sgo-10.1177_21582440211040121 for Student Perceptions of Teaching... more Supplemental material, sj-pdf-2-sgo-10.1177_21582440211040121 for Student Perceptions of Teaching Quality in Five Countries: A Partial Credit Model Approach to Assess Measurement Invariance by Rikkert M. van der Lans, Ridwan Maulana, Michelle Helms-Lorenz, Carmen-María Fernández-García, Seyeoung Chun, Thelma de Jager, Yulia Irnidayanti, Mercedes Inda-Caro, Okhwa Lee, Thys Coetzee, Nurul Fadhilah, Meae Jeon and Peter Moorer in SAGE Open
Supplemental material, sj-pdf-5-sgo-10.1177_21582440211040121 for Student Perceptions of Teaching... more Supplemental material, sj-pdf-5-sgo-10.1177_21582440211040121 for Student Perceptions of Teaching Quality in Five Countries: A Partial Credit Model Approach to Assess Measurement Invariance by Rikkert M. van der Lans, Ridwan Maulana, Michelle Helms-Lorenz, Carmen-María Fernández-García, Seyeoung Chun, Thelma de Jager, Yulia Irnidayanti, Mercedes Inda-Caro, Okhwa Lee, Thys Coetzee, Nurul Fadhilah, Meae Jeon and Peter Moorer in SAGE Open
Supplemental material, sj-pdf-1-sgo-10.1177_21582440211040121 for Student Perceptions of Teaching... more Supplemental material, sj-pdf-1-sgo-10.1177_21582440211040121 for Student Perceptions of Teaching Quality in Five Countries: A Partial Credit Model Approach to Assess Measurement Invariance by Rikkert M. van der Lans, Ridwan Maulana, Michelle Helms-Lorenz, Carmen-María Fernández-García, Seyeoung Chun, Thelma de Jager, Yulia Irnidayanti, Mercedes Inda-Caro, Okhwa Lee, Thys Coetzee, Nurul Fadhilah, Meae Jeon and Peter Moorer in SAGE Open
SAGE Open, 2021
This study examines measurement invariance of student perceptions of teaching quality collected i... more This study examines measurement invariance of student perceptions of teaching quality collected in five countries: Indonesia (n students = 6,331), the Netherlands (n students = 6,738), South Africa (n students = 3,422), South Korea (n students = 6,997) and Spain (n students = 4,676). The administered questionnaire was the My Teacher Questionnaire (MTQ). Student perceived teachers’ teaching quality was estimated using the partial credit model (PCM). Tests for differential item functioning (DIF) were used to assess measurement invariance. Furthermore, if DIF was found, it was explored whether an application of a quasi-international calibration, which estimates country-unique parameters for DIF items, can provide more valid estimates for between-country comparisons. Results indicate the absence of non-uniform DIF, but presence of uniform DIF among most items. This suggests that direct comparisons of raw mean or sum scores between countries is not advisable. Details of the set of invari...
The aim of the study was to assess internalizing problems before and during the pandemic with dat... more The aim of the study was to assess internalizing problems before and during the pandemic with data from Dutch consortium Child and adolescent mental health and wellbeing in times of the COVID-19 pandemic, consisting of two Dutch general population samples (GS) and two clinical samples (CS) referred to youth/psychiatric care. In each sample, measures of internalizing problems were obtained from ongoing data collections pre-pandemic (NGS= 35,357; NCS= 4,487) and twice during the pandemic, in Apr.–May 2020 (NGS= 3,938; clinical: NCS= 1,008) and in Nov.–Dec. 2020 (NGS= 1,489; NCS= 1,536), in children and adolescents (8-18 years) with parent- (Brief Problem Monitor) and/or child reports (Patient-Reported Outcomes Measurement Information System®). Results show significantly greater proportions of worrisome internalizing problems (based on validated cut-offs) and significantly higher internalizing problems mean levels from pre-pandemic to pandemic measurements in the general population. Th...
Teaching and Teacher Education, 2020
Take-down policy If you believe that this document breaches copyright please contact us providing... more Take-down policy If you believe that this document breaches copyright please contact us providing details, and we will remove access to the work immediately and investigate your claim.
Educational Assessment, Evaluation and Accountability, 2018
Studies in Educational Evaluation, 2018
Educational Measurement: Issues and Practice, 2019
The Journal of Experimental Education, 2017
Learning and Individual Differences, 2017
Studies in Educational Evaluation, 2016
Educational Measurement: Issues and Practice, 2015
Supplemental material, sj-pdf-3-sgo-10.1177_21582440211040121 for Student Perceptions of Teaching... more Supplemental material, sj-pdf-3-sgo-10.1177_21582440211040121 for Student Perceptions of Teaching Quality in Five Countries: A Partial Credit Model Approach to Assess Measurement Invariance by Rikkert M. van der Lans, Ridwan Maulana, Michelle Helms-Lorenz, Carmen-María Fernández-García, Seyeoung Chun, Thelma de Jager, Yulia Irnidayanti, Mercedes Inda-Caro, Okhwa Lee, Thys Coetzee, Nurul Fadhilah, Meae Jeon and Peter Moorer in SAGE Open
Using item response theory, this study explores whether student survey and classroom observation ... more Using item response theory, this study explores whether student survey and classroom observation items can be calibrated onto a common metric of teaching quality. The data comprises 269 lessons of 141 teachers that were scored on the International Comparative Analysis of
Learning and Teaching (ICALT) observation instrument and the My Teacher student survey. Using Rasch model concurrent calibration, items from both instruments were calibrated onto a common one-dimensional metric of teaching quality. Most items were found to fit the model. Challenges pertain mainly to items measuring teaching students learning strategies and differentiation. Explanations for these difficulties are discussed.
Previous studies in higher education have shown that the reliability of student ratings of teachi... more Previous studies in higher education have shown that the reliability of student ratings of teaching skill increases if multiple ratings by different students are aggregated. This study examines the generalizability of these findings to the context of secondary education. Also, it seeks to validate these findings by comparing reliability levels estimated by the routinely used nested design with those estimated using a more complex design. The sample consisted of 410 students from 17 classes rating 63 teachers working at eight schools across the Netherlands. Using the nested design, the study replicates findings of previous studies in higher education. The findings illustrate how the reliability level of secondary school students’ ratings increases with an increasing number of students. However, these replicated reliability levels were not validated by the more complex design which provided lower estimates. This indicates that the nested design may not provide accurate estimations of rating reliability.
Researchers have recently become interested in exploring cumulative order in teachers' use of tea... more Researchers have recently become interested in exploring cumulative order in teachers' use of teaching practices, which they argue may reflect stages in teacher development. However, to validly apply stage models to individuals, it is necessary to determine whether all teachers fit the stage order. This study explores whether and in how many lessons observed teaching practices do not fit the stage order and whether misfit is typical to certain teachers, which would indicate individual differences. The sample consists of 198 classroom observations of 69 teachers (two to four lessons for each teacher). Using person-fit methods, the study shows that 17% of the 198 observed lessons substantially misfit the stage order but that misfit is not characteristic to specific teachers, suggesting that it is incidental. Removing the occasional misfitting lessons allows the stage model to provide an appropriate description of teaching skill.
This study connects descriptions of effective teaching with descriptions of teacher development t... more This study connects descriptions of effective teaching with descriptions of teacher development to advance an initial understanding of how effective teaching may develop. The study's main premise is that descriptions of effective teaching develop cumulatively where more basic teaching strategies and behaviors are required before teachers may advance to more complex teaching behaviors. The sample incorporates teaching behaviors observed across 878 classrooms. Teaching behaviors were observed using the International Comparative Analysis of Learning and Teaching (ICALT) observation protocol. Using Rasch analysis, the study reveals that 31 of 32 effective teaching behaviors fit cumulative ordering. The ordering also parallels descriptions of teacher development. Together the results indicate that the instrument is a potentially useful tool to describe teachers' development of effective teaching.
Implementation of effective teacher evaluation procedures is a global challenge in which lowering... more Implementation of effective teacher evaluation procedures is a global challenge in which lowering the chances that teachers receive inaccurate evaluations is a pertinent goal. This study investigates the minimum number of observations required to guarantee that teachers receive feedback with modest reliability (Eρ2 ≥ 0.70) and that any summative decisions about their professional career have high reliability (Eρ2 ≥ 0.90). A sample of 198 classroom observations by 62 colleagues of 69 teachers working at eight schools reveals that reliable feedback requires at least three lesson visits by three different observers and that reliable summative decisions require more than 10 visits. These findings mirror those reported through other observation instruments. This study accordingly offers directions for how schools can implement such procedures most cost-effectively.
In eerder onderzoek wordt gesteld dat docenten subjectieve beoordelaars zijn die zich bij het gev... more In eerder onderzoek wordt gesteld dat docenten subjectieve beoordelaars zijn die zich bij het geven van cijfers niet beperken tot het becijferen van alleen de leerlingvaardigheid,
maar cijfers geven voor een mengelmoes (‘hodgepodge’) van eigenschappen: de hodgepodgehypothese. Ook zouden docenten verschillen in mildheid; de mildheidshypothese.
In dit onderzoek worden deze beide hypothesen onderzocht. Voor dit onderzoek zijn bij twee steekproeven proefwerkcijfers
verzameld. De eerste steekproef telt 5988 proefwerkcijfers gegeven aan 192 leerlingen gedurende één schooljaar door 64 docenten. De tweede steekproef telt 29462 proefwerkcijfers gegeven aan 306 leerlingen gedurende drie opeenvolgende schooljaren door 52 docenten. Om de beoordelingsbias
te onderzoeken werden een G-studie en D-studie uitgevoerd. De resultaten geven geen overtuigend bewijs voor de twee hypotheses. In het algemeen blijkt dat rapportcijfers een redelijk betrouwbaar onderscheid maken tussen minder en meer vaardige leerlingen (Eρ2 ≥ .70) en een betrouwbare beoordeling geven over de cesuur voldoende-onvoldoende (Φλ
≈ .90). Wanneer rapportcijfers op minder dan 8 proefwerken zijn gebaseerd dan is de betrouwbaarheid lager dan het criterium .70. Een aanzienlijk deel van de onbetrouwbaarheid in
beoordeling kan worden verklaard door verschillen in de kwaliteit van de proefwerken en niet door mildheid of hodgepodgegedrag in de beoordeling van docenten.
Abstract English
In previous research, teachers report that they use a hodgepodge of factors when grading students. This has led researchers to suspect that teacher-assigned grades are inflated by teacher-student interactions; the hodgepodge
hypothesis. Teachers also are reported to differ in grading leniency; the leniency hypothesis. In this study these two hypotheses are investigated. Two samples of teachers-assigned grades were gathered. The first sample contained 5,988 grades awarded by 64 teacher to 192 students during one school year. The second sample contained 29,462 teacher-assigned grades awarded to 306 student by 52 teachers during
three subsequent school years. Generalizability Theory is used to analyze bias. The results present little evidence to claim that school grades are considerably biased due to hodgepodge grading or teacher leniency. Unreliability in teacher-assigned grades is more due to the tests than due to teachers’
hodgepodge or leniency.
Supplemental material, sj-pdf-2-sgo-10.1177_21582440211040121 for Student Perceptions of Teaching... more Supplemental material, sj-pdf-2-sgo-10.1177_21582440211040121 for Student Perceptions of Teaching Quality in Five Countries: A Partial Credit Model Approach to Assess Measurement Invariance by Rikkert M. van der Lans, Ridwan Maulana, Michelle Helms-Lorenz, Carmen-María Fernández-García, Seyeoung Chun, Thelma de Jager, Yulia Irnidayanti, Mercedes Inda-Caro, Okhwa Lee, Thys Coetzee, Nurul Fadhilah, Meae Jeon and Peter Moorer in SAGE Open
Supplemental material, sj-pdf-5-sgo-10.1177_21582440211040121 for Student Perceptions of Teaching... more Supplemental material, sj-pdf-5-sgo-10.1177_21582440211040121 for Student Perceptions of Teaching Quality in Five Countries: A Partial Credit Model Approach to Assess Measurement Invariance by Rikkert M. van der Lans, Ridwan Maulana, Michelle Helms-Lorenz, Carmen-María Fernández-García, Seyeoung Chun, Thelma de Jager, Yulia Irnidayanti, Mercedes Inda-Caro, Okhwa Lee, Thys Coetzee, Nurul Fadhilah, Meae Jeon and Peter Moorer in SAGE Open
Supplemental material, sj-pdf-1-sgo-10.1177_21582440211040121 for Student Perceptions of Teaching... more Supplemental material, sj-pdf-1-sgo-10.1177_21582440211040121 for Student Perceptions of Teaching Quality in Five Countries: A Partial Credit Model Approach to Assess Measurement Invariance by Rikkert M. van der Lans, Ridwan Maulana, Michelle Helms-Lorenz, Carmen-María Fernández-García, Seyeoung Chun, Thelma de Jager, Yulia Irnidayanti, Mercedes Inda-Caro, Okhwa Lee, Thys Coetzee, Nurul Fadhilah, Meae Jeon and Peter Moorer in SAGE Open
SAGE Open, 2021
This study examines measurement invariance of student perceptions of teaching quality collected i... more This study examines measurement invariance of student perceptions of teaching quality collected in five countries: Indonesia (n students = 6,331), the Netherlands (n students = 6,738), South Africa (n students = 3,422), South Korea (n students = 6,997) and Spain (n students = 4,676). The administered questionnaire was the My Teacher Questionnaire (MTQ). Student perceived teachers’ teaching quality was estimated using the partial credit model (PCM). Tests for differential item functioning (DIF) were used to assess measurement invariance. Furthermore, if DIF was found, it was explored whether an application of a quasi-international calibration, which estimates country-unique parameters for DIF items, can provide more valid estimates for between-country comparisons. Results indicate the absence of non-uniform DIF, but presence of uniform DIF among most items. This suggests that direct comparisons of raw mean or sum scores between countries is not advisable. Details of the set of invari...
The aim of the study was to assess internalizing problems before and during the pandemic with dat... more The aim of the study was to assess internalizing problems before and during the pandemic with data from Dutch consortium Child and adolescent mental health and wellbeing in times of the COVID-19 pandemic, consisting of two Dutch general population samples (GS) and two clinical samples (CS) referred to youth/psychiatric care. In each sample, measures of internalizing problems were obtained from ongoing data collections pre-pandemic (NGS= 35,357; NCS= 4,487) and twice during the pandemic, in Apr.–May 2020 (NGS= 3,938; clinical: NCS= 1,008) and in Nov.–Dec. 2020 (NGS= 1,489; NCS= 1,536), in children and adolescents (8-18 years) with parent- (Brief Problem Monitor) and/or child reports (Patient-Reported Outcomes Measurement Information System®). Results show significantly greater proportions of worrisome internalizing problems (based on validated cut-offs) and significantly higher internalizing problems mean levels from pre-pandemic to pandemic measurements in the general population. Th...
Teaching and Teacher Education, 2020
Take-down policy If you believe that this document breaches copyright please contact us providing... more Take-down policy If you believe that this document breaches copyright please contact us providing details, and we will remove access to the work immediately and investigate your claim.
Educational Assessment, Evaluation and Accountability, 2018
Studies in Educational Evaluation, 2018
Educational Measurement: Issues and Practice, 2019
The Journal of Experimental Education, 2017
Learning and Individual Differences, 2017
Studies in Educational Evaluation, 2016
Educational Measurement: Issues and Practice, 2015
Supplemental material, sj-pdf-3-sgo-10.1177_21582440211040121 for Student Perceptions of Teaching... more Supplemental material, sj-pdf-3-sgo-10.1177_21582440211040121 for Student Perceptions of Teaching Quality in Five Countries: A Partial Credit Model Approach to Assess Measurement Invariance by Rikkert M. van der Lans, Ridwan Maulana, Michelle Helms-Lorenz, Carmen-María Fernández-García, Seyeoung Chun, Thelma de Jager, Yulia Irnidayanti, Mercedes Inda-Caro, Okhwa Lee, Thys Coetzee, Nurul Fadhilah, Meae Jeon and Peter Moorer in SAGE Open
Using item response theory, this study explores whether student survey and classroom observation ... more Using item response theory, this study explores whether student survey and classroom observation items can be calibrated onto a common metric of teaching quality. The data comprises 269 lessons of 141 teachers that were scored on the International Comparative Analysis of
Learning and Teaching (ICALT) observation instrument and the My Teacher student survey. Using Rasch model concurrent calibration, items from both instruments were calibrated onto a common one-dimensional metric of teaching quality. Most items were found to fit the model. Challenges pertain mainly to items measuring teaching students learning strategies and differentiation. Explanations for these difficulties are discussed.
Previous studies in higher education have shown that the reliability of student ratings of teachi... more Previous studies in higher education have shown that the reliability of student ratings of teaching skill increases if multiple ratings by different students are aggregated. This study examines the generalizability of these findings to the context of secondary education. Also, it seeks to validate these findings by comparing reliability levels estimated by the routinely used nested design with those estimated using a more complex design. The sample consisted of 410 students from 17 classes rating 63 teachers working at eight schools across the Netherlands. Using the nested design, the study replicates findings of previous studies in higher education. The findings illustrate how the reliability level of secondary school students’ ratings increases with an increasing number of students. However, these replicated reliability levels were not validated by the more complex design which provided lower estimates. This indicates that the nested design may not provide accurate estimations of rating reliability.
Researchers have recently become interested in exploring cumulative order in teachers' use of tea... more Researchers have recently become interested in exploring cumulative order in teachers' use of teaching practices, which they argue may reflect stages in teacher development. However, to validly apply stage models to individuals, it is necessary to determine whether all teachers fit the stage order. This study explores whether and in how many lessons observed teaching practices do not fit the stage order and whether misfit is typical to certain teachers, which would indicate individual differences. The sample consists of 198 classroom observations of 69 teachers (two to four lessons for each teacher). Using person-fit methods, the study shows that 17% of the 198 observed lessons substantially misfit the stage order but that misfit is not characteristic to specific teachers, suggesting that it is incidental. Removing the occasional misfitting lessons allows the stage model to provide an appropriate description of teaching skill.
This study connects descriptions of effective teaching with descriptions of teacher development t... more This study connects descriptions of effective teaching with descriptions of teacher development to advance an initial understanding of how effective teaching may develop. The study's main premise is that descriptions of effective teaching develop cumulatively where more basic teaching strategies and behaviors are required before teachers may advance to more complex teaching behaviors. The sample incorporates teaching behaviors observed across 878 classrooms. Teaching behaviors were observed using the International Comparative Analysis of Learning and Teaching (ICALT) observation protocol. Using Rasch analysis, the study reveals that 31 of 32 effective teaching behaviors fit cumulative ordering. The ordering also parallels descriptions of teacher development. Together the results indicate that the instrument is a potentially useful tool to describe teachers' development of effective teaching.
Implementation of effective teacher evaluation procedures is a global challenge in which lowering... more Implementation of effective teacher evaluation procedures is a global challenge in which lowering the chances that teachers receive inaccurate evaluations is a pertinent goal. This study investigates the minimum number of observations required to guarantee that teachers receive feedback with modest reliability (Eρ2 ≥ 0.70) and that any summative decisions about their professional career have high reliability (Eρ2 ≥ 0.90). A sample of 198 classroom observations by 62 colleagues of 69 teachers working at eight schools reveals that reliable feedback requires at least three lesson visits by three different observers and that reliable summative decisions require more than 10 visits. These findings mirror those reported through other observation instruments. This study accordingly offers directions for how schools can implement such procedures most cost-effectively.
In eerder onderzoek wordt gesteld dat docenten subjectieve beoordelaars zijn die zich bij het gev... more In eerder onderzoek wordt gesteld dat docenten subjectieve beoordelaars zijn die zich bij het geven van cijfers niet beperken tot het becijferen van alleen de leerlingvaardigheid,
maar cijfers geven voor een mengelmoes (‘hodgepodge’) van eigenschappen: de hodgepodgehypothese. Ook zouden docenten verschillen in mildheid; de mildheidshypothese.
In dit onderzoek worden deze beide hypothesen onderzocht. Voor dit onderzoek zijn bij twee steekproeven proefwerkcijfers
verzameld. De eerste steekproef telt 5988 proefwerkcijfers gegeven aan 192 leerlingen gedurende één schooljaar door 64 docenten. De tweede steekproef telt 29462 proefwerkcijfers gegeven aan 306 leerlingen gedurende drie opeenvolgende schooljaren door 52 docenten. Om de beoordelingsbias
te onderzoeken werden een G-studie en D-studie uitgevoerd. De resultaten geven geen overtuigend bewijs voor de twee hypotheses. In het algemeen blijkt dat rapportcijfers een redelijk betrouwbaar onderscheid maken tussen minder en meer vaardige leerlingen (Eρ2 ≥ .70) en een betrouwbare beoordeling geven over de cesuur voldoende-onvoldoende (Φλ
≈ .90). Wanneer rapportcijfers op minder dan 8 proefwerken zijn gebaseerd dan is de betrouwbaarheid lager dan het criterium .70. Een aanzienlijk deel van de onbetrouwbaarheid in
beoordeling kan worden verklaard door verschillen in de kwaliteit van de proefwerken en niet door mildheid of hodgepodgegedrag in de beoordeling van docenten.
Abstract English
In previous research, teachers report that they use a hodgepodge of factors when grading students. This has led researchers to suspect that teacher-assigned grades are inflated by teacher-student interactions; the hodgepodge
hypothesis. Teachers also are reported to differ in grading leniency; the leniency hypothesis. In this study these two hypotheses are investigated. Two samples of teachers-assigned grades were gathered. The first sample contained 5,988 grades awarded by 64 teacher to 192 students during one school year. The second sample contained 29,462 teacher-assigned grades awarded to 306 student by 52 teachers during
three subsequent school years. Generalizability Theory is used to analyze bias. The results present little evidence to claim that school grades are considerably biased due to hodgepodge grading or teacher leniency. Unreliability in teacher-assigned grades is more due to the tests than due to teachers’
hodgepodge or leniency.