Cees Glas - Academia.edu (original) (raw)
Papers by Cees Glas
Lecture Notes in Statistics, 2001
Item response theory (IRT) is a powerful tool for the detection of differential item functioning ... more Item response theory (IRT) is a powerful tool for the detection of differential item functioning (DIF). It is shown that the class of IRT models with manifest predictors is a comprehensive framework for the detection of DIF. These models also support the investigation of the causes of DIF. In principle, the responses to every item in a test can be
Mesure et évaluation en éducation, 2008
Elements of Adaptive Testing, 2009
Computer-based testing (CBT), as computerized adaptive testing (CAT), is based on the availabilit... more Computer-based testing (CBT), as computerized adaptive testing (CAT), is based on the availability of a large pool of calibrated test items. Usually, the calibration process consists of two stages.
Encyclopedia of Social Measurement, 2005
Lecture Notes in Statistics, 2001
Elements of Adaptive Testing, 2009
Item response theory (IRT) models with random person parameters have become a common choice among... more Item response theory (IRT) models with random person parameters have become a common choice among practitioners in the field of educational and psychological measurement. Though initially the choice for such models was motivated by an attempt to get rid of the statistical problems inherent in the incidental nature of the person parameters (Bock & Lieberman, 1970), the insight soon emerged
Rheumatology (Oxford, England), Jan 29, 2015
To evaluate the content validity and measurement properties of the Patient-Reported Outcome Measu... more To evaluate the content validity and measurement properties of the Patient-Reported Outcome Measurement Information System (PROMIS) physical function item bank and a 20-item short form in patients with RA in comparison with the HAQ disability index (HAQ-DI) and 36-item Short Form Health Survey (SF-36) physical functioning scale (PF-10). The content validity of the instruments was evaluated by linking their items to the International Classification of Functioning, Disability and Health (ICF) core set for RA. The measures were administered to 690 RA patients enrolled in the Dutch Rheumatoid Arthritis Monitoring registry. Measurement precision was evaluated using item response theory methods and construct validity was evaluated by correlating physical function scores with other clinical and patient-reported outcome measures. All 207 health concepts identified in the physical function measures referred to activities that are featured in the ICF. Twenty-three of 26 ICF RA core set domain...
Most item response (IRT) models are stochastic models for the responses of persons to items where... more Most item response (IRT) models are stochastic models for the responses of persons to items where the effects of the persons and the items on the responses are modeled by separate sets of parameters. Usually, IRT models pertain to discrete responses. In some testing situations, however, the responses are continuous, or the number of response categories of an item is so large that it is convenient to treat the responses as continuous. Mellenbergh (1994) proposed an IRT model to deal with these responses. In the present paper, the model is generalized to a model with multidimensional ability parameters and a maximum marginal likelihood estimation method is presented. Further, test statistics are presented to test the model against differential item functioning, violation of local independence and violation of normality assumptions. A simulation study is presented to assess the power of the tests. Further, a real data example pertaining to equating of examination packages is presented ...
Educational Research and Evaluation, 2015
ABSTRACT As expectations of the economic impact of educational attainment are soaring (Hanushek &... more ABSTRACT As expectations of the economic impact of educational attainment are soaring (Hanushek & Woessmann, 200912. Hanushek E. A., & Woessmann, L. (2009). Do better schools lead to more growth? Cognitive skills, economic outcomes, and causation (NBER Working Paper, No. 14633). Cambridge, MA: National Bureau of Economic Research.[CrossRef]View all references) and conjectures about successful national educational reforms (Mourshed, Chijioke, & Barber, 201021. Mourshed, M., Chijioke, C., & Barber, M. (2010). How the world's most improved school systems keep getting better. Retrieved from http://www.mckinsey.com/client_service/social_sector/latest_thinking/worlds_most_improved_schoolsView all references) are welcomed by educational policy-makers in many countries, a careful assessment of the empirical evidence for these kinds of claims is needed. In this article, we present a methodology that was applied to an international data set. A multi-level model of education was used to present a hypothetical scenario, indicated as the “implementation scenario”. The scenario was tested on the Programme for International Student Assessment (PISA) 2009 data set by means of multi-level structural equation modelling. Although we find some evidence for direct effects and some support for straightforward implementation, the overall impact of malleable conditions at the system and school level appears disappointingly small. A theoretical strand of literature that would account for “limited malleability” is referred to in discussing these results.
Handbook of Statistics, 2006
International Journal of Testing
Computerized Adaptive Testing: Theory and Practice, 2000
In the previous chapter, Wainer, Bradlow and Du (this volume) presented a generalization of the t... more In the previous chapter, Wainer, Bradlow and Du (this volume) presented a generalization of the three-parameter item response model in which the dependencies generated by a testlet structure are explicitly taken into account. That chapter is an extension of their prior work that developed the generalization for the two-parameter model (Bradlow, Wainer, & Wang, 1999). Their approach is to use
The Journal of rheumatology, 2015
To compare the psychometric functioning of multidimensional disease-specific, multiitem generic, ... more To compare the psychometric functioning of multidimensional disease-specific, multiitem generic, and single-item measures of fatigue in patients with rheumatoid arthritis (RA). Confirmatory factor analysis (CFA) and longitudinal item response theory (IRT) modeling were used to evaluate the measurement structure and local reliability of the Bristol RA Fatigue Multi-Dimensional Questionnaire (BRAF-MDQ), the Medical Outcomes Study Short Form-36 (SF-36) vitality scale, and the BRAF Numerical Rating Scales (BRAF-NRS) in a sample of 588 patients with RA. A 1-factor CFA model yielded a similar fit to a 5-factor model with subscale-specific dimensions, and the items from the different instruments adequately fit the IRT model, suggesting essential unidimensionality in measurement. The SF-36 vitality scale outperformed the BRAF-MDQ at lower levels of fatigue, but was less precise at moderate to higher levels of fatigue. At these levels of fatigue, the living, cognition, and emotion subscales ...
Arthritis & rheumatology (Hoboken, N.J.), 2014
To evaluate and compare the measurement precision and sensitivity to change of the Health Assessm... more To evaluate and compare the measurement precision and sensitivity to change of the Health Assessment Questionnaire disability index (HAQ DI), the Short Form 36 physical functioning scale (PF-10), and simulated Patient-Reported Outcomes Measurement Information System (PROMIS) physical function computer adaptive tests (CATs) with 5, 10, and 15 items, using item response theory-based simulation studies. The measurement precision of the various physical function instruments was evaluated by calculating root mean square errors (RMSEs) between true physical function levels (latent physical function score) and estimated physical function levels. Measurement precision was evaluated at 9 levels of physical function, with 5,000 simulated response patterns per level. Sensitivity to change was evaluated by the ability of a simple statistical test to detect simulated change scores of small to moderate magnitude (standardized effect sizes 0.20, 0.35, and 0.50). RMSEs were smaller for the PROMIS p...
Health and quality of life outcomes, 2005
Currently, there is a lot of interest in the flexible framework offered by item banks for measuri... more Currently, there is a lot of interest in the flexible framework offered by item banks for measuring patient relevant outcomes. However, there are few item banks, which have been developed to quantify functional status, as expressed by the ability to perform activities of daily life. This paper examines the measurement properties of the Academic Medical Center linear disability score item bank in a mixed population. This paper uses item response theory to analyse data on 115 of 170 items from a total of 1002 respondents. These were: 551 (55%) residents of supported housing, residential care or nursing homes; 235 (23%) patients with chronic pain; 127 (13%) inpatients on a neurology ward following a stroke; and 89 (9%) patients suffering from Parkinson's disease. Of the 170 items, 115 were judged to be clinically relevant. Of these 115 items, 77 were retained in the item bank following the item response theory analysis. Of the 38 items that were excluded from the item bank, 24 had ...
Health and quality of life outcomes, Jan 16, 2004
Whenever questionnaires are used to collect data on constructs, such as functional status or heal... more Whenever questionnaires are used to collect data on constructs, such as functional status or health related quality of life, it is unlikely that all respondents will respond to all items. This paper examines ways of dealing with responses in a 'not applicable' category to items included in the AMC Linear Disability Score (ALDS) project item bank. The data examined in this paper come from the responses of 392 respondents to 32 items and form part of the calibration sample for the ALDS item bank. The data are analysed using the one-parameter logistic item response theory model. The four practical strategies for dealing with this type of response are: cold deck imputation; hot deck imputation; treating the missing responses as if these items had never been offered to those individual patients; and using a model which takes account of the 'tendency to respond to items'. The item and respondent population parameter estimates were very similar for the strategies involving ...
Lecture Notes in Statistics, 2001
ABSTRACT We consider a latent trait model developed by Rasch for the response time on a set of pu... more ABSTRACT We consider a latent trait model developed by Rasch for the response time on a set of pure speed tests, which is based on the assumption that the test response times are approximately gamma distributed with known index parameters and scale parameters depending on subject ability and test difficulty parameters. In this chapter, the principle of Lagrange multiplier tests is used to evaluate differential test functioning and subgroup invariance of the test parameters. Two numerical illustrations are given.
Elements of Adaptive Testing, 2009
Lecture Notes in Statistics, 2001
Item response theory (IRT) is a powerful tool for the detection of differential item functioning ... more Item response theory (IRT) is a powerful tool for the detection of differential item functioning (DIF). It is shown that the class of IRT models with manifest predictors is a comprehensive framework for the detection of DIF. These models also support the investigation of the causes of DIF. In principle, the responses to every item in a test can be
Mesure et évaluation en éducation, 2008
Elements of Adaptive Testing, 2009
Computer-based testing (CBT), as computerized adaptive testing (CAT), is based on the availabilit... more Computer-based testing (CBT), as computerized adaptive testing (CAT), is based on the availability of a large pool of calibrated test items. Usually, the calibration process consists of two stages.
Encyclopedia of Social Measurement, 2005
Lecture Notes in Statistics, 2001
Elements of Adaptive Testing, 2009
Item response theory (IRT) models with random person parameters have become a common choice among... more Item response theory (IRT) models with random person parameters have become a common choice among practitioners in the field of educational and psychological measurement. Though initially the choice for such models was motivated by an attempt to get rid of the statistical problems inherent in the incidental nature of the person parameters (Bock & Lieberman, 1970), the insight soon emerged
Rheumatology (Oxford, England), Jan 29, 2015
To evaluate the content validity and measurement properties of the Patient-Reported Outcome Measu... more To evaluate the content validity and measurement properties of the Patient-Reported Outcome Measurement Information System (PROMIS) physical function item bank and a 20-item short form in patients with RA in comparison with the HAQ disability index (HAQ-DI) and 36-item Short Form Health Survey (SF-36) physical functioning scale (PF-10). The content validity of the instruments was evaluated by linking their items to the International Classification of Functioning, Disability and Health (ICF) core set for RA. The measures were administered to 690 RA patients enrolled in the Dutch Rheumatoid Arthritis Monitoring registry. Measurement precision was evaluated using item response theory methods and construct validity was evaluated by correlating physical function scores with other clinical and patient-reported outcome measures. All 207 health concepts identified in the physical function measures referred to activities that are featured in the ICF. Twenty-three of 26 ICF RA core set domain...
Most item response (IRT) models are stochastic models for the responses of persons to items where... more Most item response (IRT) models are stochastic models for the responses of persons to items where the effects of the persons and the items on the responses are modeled by separate sets of parameters. Usually, IRT models pertain to discrete responses. In some testing situations, however, the responses are continuous, or the number of response categories of an item is so large that it is convenient to treat the responses as continuous. Mellenbergh (1994) proposed an IRT model to deal with these responses. In the present paper, the model is generalized to a model with multidimensional ability parameters and a maximum marginal likelihood estimation method is presented. Further, test statistics are presented to test the model against differential item functioning, violation of local independence and violation of normality assumptions. A simulation study is presented to assess the power of the tests. Further, a real data example pertaining to equating of examination packages is presented ...
Educational Research and Evaluation, 2015
ABSTRACT As expectations of the economic impact of educational attainment are soaring (Hanushek &... more ABSTRACT As expectations of the economic impact of educational attainment are soaring (Hanushek & Woessmann, 200912. Hanushek E. A., & Woessmann, L. (2009). Do better schools lead to more growth? Cognitive skills, economic outcomes, and causation (NBER Working Paper, No. 14633). Cambridge, MA: National Bureau of Economic Research.[CrossRef]View all references) and conjectures about successful national educational reforms (Mourshed, Chijioke, & Barber, 201021. Mourshed, M., Chijioke, C., & Barber, M. (2010). How the world's most improved school systems keep getting better. Retrieved from http://www.mckinsey.com/client_service/social_sector/latest_thinking/worlds_most_improved_schoolsView all references) are welcomed by educational policy-makers in many countries, a careful assessment of the empirical evidence for these kinds of claims is needed. In this article, we present a methodology that was applied to an international data set. A multi-level model of education was used to present a hypothetical scenario, indicated as the “implementation scenario”. The scenario was tested on the Programme for International Student Assessment (PISA) 2009 data set by means of multi-level structural equation modelling. Although we find some evidence for direct effects and some support for straightforward implementation, the overall impact of malleable conditions at the system and school level appears disappointingly small. A theoretical strand of literature that would account for “limited malleability” is referred to in discussing these results.
Handbook of Statistics, 2006
International Journal of Testing
Computerized Adaptive Testing: Theory and Practice, 2000
In the previous chapter, Wainer, Bradlow and Du (this volume) presented a generalization of the t... more In the previous chapter, Wainer, Bradlow and Du (this volume) presented a generalization of the three-parameter item response model in which the dependencies generated by a testlet structure are explicitly taken into account. That chapter is an extension of their prior work that developed the generalization for the two-parameter model (Bradlow, Wainer, & Wang, 1999). Their approach is to use
The Journal of rheumatology, 2015
To compare the psychometric functioning of multidimensional disease-specific, multiitem generic, ... more To compare the psychometric functioning of multidimensional disease-specific, multiitem generic, and single-item measures of fatigue in patients with rheumatoid arthritis (RA). Confirmatory factor analysis (CFA) and longitudinal item response theory (IRT) modeling were used to evaluate the measurement structure and local reliability of the Bristol RA Fatigue Multi-Dimensional Questionnaire (BRAF-MDQ), the Medical Outcomes Study Short Form-36 (SF-36) vitality scale, and the BRAF Numerical Rating Scales (BRAF-NRS) in a sample of 588 patients with RA. A 1-factor CFA model yielded a similar fit to a 5-factor model with subscale-specific dimensions, and the items from the different instruments adequately fit the IRT model, suggesting essential unidimensionality in measurement. The SF-36 vitality scale outperformed the BRAF-MDQ at lower levels of fatigue, but was less precise at moderate to higher levels of fatigue. At these levels of fatigue, the living, cognition, and emotion subscales ...
Arthritis & rheumatology (Hoboken, N.J.), 2014
To evaluate and compare the measurement precision and sensitivity to change of the Health Assessm... more To evaluate and compare the measurement precision and sensitivity to change of the Health Assessment Questionnaire disability index (HAQ DI), the Short Form 36 physical functioning scale (PF-10), and simulated Patient-Reported Outcomes Measurement Information System (PROMIS) physical function computer adaptive tests (CATs) with 5, 10, and 15 items, using item response theory-based simulation studies. The measurement precision of the various physical function instruments was evaluated by calculating root mean square errors (RMSEs) between true physical function levels (latent physical function score) and estimated physical function levels. Measurement precision was evaluated at 9 levels of physical function, with 5,000 simulated response patterns per level. Sensitivity to change was evaluated by the ability of a simple statistical test to detect simulated change scores of small to moderate magnitude (standardized effect sizes 0.20, 0.35, and 0.50). RMSEs were smaller for the PROMIS p...
Health and quality of life outcomes, 2005
Currently, there is a lot of interest in the flexible framework offered by item banks for measuri... more Currently, there is a lot of interest in the flexible framework offered by item banks for measuring patient relevant outcomes. However, there are few item banks, which have been developed to quantify functional status, as expressed by the ability to perform activities of daily life. This paper examines the measurement properties of the Academic Medical Center linear disability score item bank in a mixed population. This paper uses item response theory to analyse data on 115 of 170 items from a total of 1002 respondents. These were: 551 (55%) residents of supported housing, residential care or nursing homes; 235 (23%) patients with chronic pain; 127 (13%) inpatients on a neurology ward following a stroke; and 89 (9%) patients suffering from Parkinson's disease. Of the 170 items, 115 were judged to be clinically relevant. Of these 115 items, 77 were retained in the item bank following the item response theory analysis. Of the 38 items that were excluded from the item bank, 24 had ...
Health and quality of life outcomes, Jan 16, 2004
Whenever questionnaires are used to collect data on constructs, such as functional status or heal... more Whenever questionnaires are used to collect data on constructs, such as functional status or health related quality of life, it is unlikely that all respondents will respond to all items. This paper examines ways of dealing with responses in a 'not applicable' category to items included in the AMC Linear Disability Score (ALDS) project item bank. The data examined in this paper come from the responses of 392 respondents to 32 items and form part of the calibration sample for the ALDS item bank. The data are analysed using the one-parameter logistic item response theory model. The four practical strategies for dealing with this type of response are: cold deck imputation; hot deck imputation; treating the missing responses as if these items had never been offered to those individual patients; and using a model which takes account of the 'tendency to respond to items'. The item and respondent population parameter estimates were very similar for the strategies involving ...
Lecture Notes in Statistics, 2001
ABSTRACT We consider a latent trait model developed by Rasch for the response time on a set of pu... more ABSTRACT We consider a latent trait model developed by Rasch for the response time on a set of pure speed tests, which is based on the assumption that the test response times are approximately gamma distributed with known index parameters and scale parameters depending on subject ability and test difficulty parameters. In this chapter, the principle of Lagrange multiplier tests is used to evaluate differential test functioning and subgroup invariance of the test parameters. Two numerical illustrations are given.
Elements of Adaptive Testing, 2009